More on odd spider behavior of my own IP

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

More on odd spider behavior of my own IP

Bill Ross-2
Does anyone watch the spiders? I actually feed them by using interesting math to derive file creation times that I publish, unrelated to the actual file, in order to watch how they probe. So it was a big surprise to see spider-like behavior coming from behind my home router. The remaining possibilities areĀ  my backend Ubuntu GPU deep learning box, and my router, assuming it unlikely that an intruder on my router (none shown connected now) would go after my website in slow motion to the tune of ~10 requests over ~4 days. The innocuous explanation I can come up with is that some sort of buggy spontaneous cache refresh is involved, but two different patterns have been seen, so I remain hooked on the problem.

Here I analyze the timing and ridiculousness of the requests in the jetty log, a pleasant distraction from about 5 OS installs over the last 48 hours or so. Who knows what my FB friends thought of these posts. :-)

Any ideas about the cause would be welcome. Odd to see a stranger in the mirror.

Bill

---

I looked at my website's log and noticed that my laptop [more accurately, my IP address] had downloaded 3 files twice within a second, as if it was a google bot probing my website, something that would take me some effort to do myself that way; I know it couldn't have been me for a few reasons elucidated below. I had recently installed Apple's new version of the El Capitan operating system (for the CPU exploits that have been in the news), and also [thought I might have picked up a virus on FB]. I called Apple, but their help people could not understand the concept of detecting a virus in a web site log, or the fact it could mean that there is a virus in the latest bugfix they just pushed.

So I reported it to the authorities (2018-USCERTv33LDPI), wiped my laptop and upgraded to the next operating system, High Sierra, then changed all my passwords.

If I hadn't had access to my own website's logs and recognized my IP address, things would seem fine, but maybe all my money would disappear at some point. Obviously I can't hope to pick up stuff like this in any predictable way, so the only answer I can think of is to stay poor and change passwords regularly.

If I wrote something nefarious that did what I saw, it would be part of an attack on my website (likely while rifling the laptop) as a test probe to analyze the timestamps on multiple copies of the files, which as it happens I have come up with some entertaining math to derive - since no one else is interested in Phobrain, I provide intellectual food for the spiders, at least.

Here are the time stamps for the requests from my address, from the log. I think no one should graduate high school these days without being able to spot that these all happen in < 1 second and make no sense for a browser to do.

1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico

Thinking further: note the deltas, e..g. the 2 html pages load within (982-810=) 172 milliseconds, implying < 86 millis each way. Using a utility called 'ping' I see the raw net time between laptop and site right now:

$ ping phobrain.com
PING phobrain.com (70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C

Which means (86-71=) 15 millis spent calcing per direction, or 30 milliseconds total for the laptop to be thinking between requests. As it happens, expt.html is my original view page, which I pointed at view.html around the middle of last year, so it would be natural to follow the link, and I'll keep the 30 millis in mind.

Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico (which is an 'asset' of the view.html page), so that's an extra (199-172=) 27 millis over the processing time that led to calling for view.html, likely because view.html is 3x the size of expt.html (you can see for yourself! :-).

With that in mind, we are ready to answer the burning question: were the repeat loads timed the same as the initial ones? And when was the decision made to launch the second round?

Roundtrip times, estimated calc times:
First: 172/30, 199/57

First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133

So it spent 100 millis longer thinking about whether to repeat than it did to decide to load favicon.ico and not a bunch of other assets that normally get loaded. Now let's look at the roundtrip times of the next series, to see if they were all sucked at once, or the same process was followed.

Roundtrip times, estimated calc times:
Second: 83/<0, 142/0, veddy interesting.

---

Spider-like behavior is also happening with a wipe and install of High Sierra: below I show one jpeg being fetched without a web page twice, and no other activity at the time. In this case I haven't installed Chrome, using Firefox and Safari instead. I'm going to try installing linux.


Last seeming-human activity, involving a series of normal back-and-forths with multiple loads:

01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false

Then the first odd load without a page:

01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Next seems like me, as above:

01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false

Then the apparent robot:

01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Eventually likely-me again:

01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST

---

I installed Ubuntu on the Macbook, and with no MacOS around, I just saw another unexplained load of the same page that was loaded yesterday at about the same time (01/16 07:46:09.933 and 08:55:01.199 with MacOS, 01/17 07:47:23.052 with Ubuntu).



_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: More on odd spider behavior of my own IP

Simone Bordet-3
Hi,

On Thu, Jan 18, 2018 at 10:45 AM, Bill <[hidden email]> wrote:
>
> Does anyone watch the spiders?

[snip]

Have you tried using wireshark to know the client socket being opened
and the lsof/ss/netstat on your machines to understand what process
opened that socket ?

--
Simone Bordet
----
http://cometd.org
http://webtide.com
Developer advice, training, services and support
from the Jetty & CometD experts.
_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: More on odd spider behavior of my own IP

Bill Ross-2
I haven't watched the wire for a long time, I guess it'd be possible to
filter for packets going to my website. No action in the log. Ideally a
monitor process would filter and check the process id on the spot.
Taking a look at wireshark.


On 01/18/2018 11:20 PM, Simone Bordet wrote:

> Hi,
>
> On Thu, Jan 18, 2018 at 10:45 AM, Bill <[hidden email]> wrote:
>> Does anyone watch the spiders?
> [snip]
>
> Have you tried using wireshark to know the client socket being opened
> and the lsof/ss/netstat on your machines to understand what process
> opened that socket ?
>

_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: More on odd spider behavior of my own IP

Bill Ross-2
It looks like wireshark doesn't display the pid, but it will give useful
info if I see another probe in the server log.

Thanks!


On 01/19/2018 12:46 AM, Bill wrote:

> I haven't watched the wire for a long time, I guess it'd be possible
> to filter for packets going to my website. No action in the log.
> Ideally a monitor process would filter and check the process id on the
> spot. Taking a look at wireshark.
>
>
> On 01/18/2018 11:20 PM, Simone Bordet wrote:
>> Hi,
>>
>> On Thu, Jan 18, 2018 at 10:45 AM, Bill <[hidden email]> wrote:
>>> Does anyone watch the spiders?
>> [snip]
>>
>> Have you tried using wireshark to know the client socket being opened
>> and the lsof/ss/netstat on your machines to understand what process
>> opened that socket ?
>>
>
> _______________________________________________
> jetty-users mailing list
> [hidden email]
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/jetty-users
>

_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users
Reply | Threaded
Open this post in threaded view
|

Re: More on odd spider behavior of my own IP

Bill Ross-2
In reply to this post by Bill Ross-2

My default explanation now is that somehow jetty or ubuntu supplied the wrong IP's for logging.

Bill


On 1/18/18 1:45 AM, Bill wrote:
Does anyone watch the spiders? I actually feed them by using interesting math to derive file creation times that I publish, unrelated to the actual file, in order to watch how they probe. So it was a big surprise to see spider-like behavior coming from behind my home router. The remaining possibilities areĀ  my backend Ubuntu GPU deep learning box, and my router, assuming it unlikely that an intruder on my router (none shown connected now) would go after my website in slow motion to the tune of ~10 requests over ~4 days. The innocuous explanation I can come up with is that some sort of buggy spontaneous cache refresh is involved, but two different patterns have been seen, so I remain hooked on the problem.

Here I analyze the timing and ridiculousness of the requests in the jetty log, a pleasant distraction from about 5 OS installs over the last 48 hours or so. Who knows what my FB friends thought of these posts. :-)

Any ideas about the cause would be welcome. Odd to see a stranger in the mirror.

Bill

---

I looked at my website's log and noticed that my laptop [more accurately, my IP address] had downloaded 3 files twice within a second, as if it was a google bot probing my website, something that would take me some effort to do myself that way; I know it couldn't have been me for a few reasons elucidated below. I had recently installed Apple's new version of the El Capitan operating system (for the CPU exploits that have been in the news), and also [thought I might have picked up a virus on FB]. I called Apple, but their help people could not understand the concept of detecting a virus in a web site log, or the fact it could mean that there is a virus in the latest bugfix they just pushed.

So I reported it to the authorities (2018-USCERTv33LDPI), wiped my laptop and upgraded to the next operating system, High Sierra, then changed all my passwords.

If I hadn't had access to my own website's logs and recognized my IP address, things would seem fine, but maybe all my money would disappear at some point. Obviously I can't hope to pick up stuff like this in any predictable way, so the only answer I can think of is to stay poor and change passwords regularly.

If I wrote something nefarious that did what I saw, it would be part of an attack on my website (likely while rifling the laptop) as a test probe to analyze the timestamps on multiple copies of the files, which as it happens I have come up with some entertaining math to derive - since no one else is interested in Phobrain, I provide intellectual food for the spiders, at least.

Here are the time stamps for the requests from my address, from the log. I think no one should graduate high school these days without being able to spot that these all happen in < 1 second and make no sense for a browser to do.

1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico

Thinking further: note the deltas, e..g. the 2 html pages load within (982-810=) 172 milliseconds, implying < 86 millis each way. Using a utility called 'ping' I see the raw net time between laptop and site right now:

$ ping phobrain.com
PING phobrain.com (70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C

Which means (86-71=) 15 millis spent calcing per direction, or 30 milliseconds total for the laptop to be thinking between requests. As it happens, expt.html is my original view page, which I pointed at view.html around the middle of last year, so it would be natural to follow the link, and I'll keep the 30 millis in mind.

Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico (which is an 'asset' of the view.html page), so that's an extra (199-172=) 27 millis over the processing time that led to calling for view.html, likely because view.html is 3x the size of expt.html (you can see for yourself! :-).

With that in mind, we are ready to answer the burning question: were the repeat loads timed the same as the initial ones? And when was the decision made to launch the second round?

Roundtrip times, estimated calc times:
First: 172/30, 199/57

First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133

So it spent 100 millis longer thinking about whether to repeat than it did to decide to load favicon.ico and not a bunch of other assets that normally get loaded. Now let's look at the roundtrip times of the next series, to see if they were all sucked at once, or the same process was followed.

Roundtrip times, estimated calc times:
Second: 83/<0, 142/0, veddy interesting.

---

Spider-like behavior is also happening with a wipe and install of High Sierra: below I show one jpeg being fetched without a web page twice, and no other activity at the time. In this case I haven't installed Chrome, using Firefox and Safari instead. I'm going to try installing linux.


Last seeming-human activity, involving a series of normal back-and-forths with multiple loads:

01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false

Then the first odd load without a page:

01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Next seems like me, as above:

01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false

Then the apparent robot:

01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Eventually likely-me again:

01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST

---

I installed Ubuntu on the Macbook, and with no MacOS around, I just saw another unexplained load of the same page that was loaded yesterday at about the same time (01/16 07:46:09.933 and 08:55:01.199 with MacOS, 01/17 07:47:23.052 with Ubuntu).




_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users


_______________________________________________
jetty-users mailing list
[hidden email]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users