[jetty-users] More on odd spider behavior of my own IP

Discussion:

Bill

2018-01-18 09:45:22 UTC

Does anyone watch the spiders? I actually feed them by using interesting
math to derive file creation times that I publish, unrelated to the
actual file, in order to watch how they probe. So it was a big surprise
to see spider-like behavior coming from behind my home router. The
remaining possibilities areÂ my backend Ubuntu GPU deep learning box,
and my router, assuming it unlikely that an intruder on my router (none
shown connected now) would go after my website in slow motion to the
tune of ~10 requests over ~4 days. The innocuous explanation I can come
up with is that some sort of buggy spontaneous cache refresh is
involved, but two different patterns have been seen, so I remain hooked
on the problem.

Here I analyze the timing and ridiculousness of the requests in the
jetty log, a pleasant distraction from about 5 OS installs over the last
48 hours or so. Who knows what my FB friends thought of these posts. :-)

Any ideas about the cause would be welcome. Odd to see a stranger in the
mirror.

Bill

---

I looked at my website's log and noticed that my laptop [more
accurately, my IP address] had downloaded 3 files twice within a second,
as if it was a google bot probing my website, something that would take
me some effort to do myself that way; I know it couldn't have been me
for a few reasons elucidated below. I had recently installed Apple's new
version of the El Capitan operating system (for the CPU exploits that
have been in the news), and also [thought I might have picked up a virus
on FB]. I called Apple, but their help people could not understand the
concept of detecting a virus in a web site log, or the fact it could
mean that there is a virus in the latest bugfix they just pushed.

So I reported it to the authorities (2018-USCERTv33LDPI), wiped my
laptop and upgraded to the next operating system, High Sierra, then
changed all my passwords.

If I hadn't had access to my own website's logs and recognized my IP
address, things would seem fine, but maybe all my money would disappear
at some point. Obviously I can't hope to pick up stuff like this in any
predictable way, so the only answer I can think of is to stay poor and
change passwords regularly.

If I wrote something nefarious that did what I saw, it would be part of
an attack on my website (likely while rifling the laptop) as a test
probe to analyze the timestamps on multiple copies of the files, which
as it happens I have come up with some entertaining math to derive -
since no one else is interested in Phobrain, I provide intellectual food
for the spiders, at least.

Here are the time stamps for the requests from my address, from the log.
I think no one should graduate high school these days without being able
to spot that these all happen in < 1 second and make no sense for a
browser to do.

1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico

Thinking further: note the deltas, e..g. the 2 html pages load within
(982-810=) 172 milliseconds, implying < 86 millis each way. Using a
utility called 'ping' I see the raw net time between laptop and site
right now:

$ ping phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOwOlpDppqR_fGpiJu0tipKD4iqiPs9FonyP4xGCIGm5D2-g48xpAUcvEV3unXSJAf7RTgg8Cw7hp05FRFLXNr3o34_PT5GofN8l466Ao4lzOobBJ5lYS5w4PU3rTEpN5LBZBtf4V1C66blDGh3dbqUQXRbEvBUfCgmz0kH1qw1KtURWmHxfRfonrGTtbg-8LVAAWbjUBt8Zv9_jDFmFZAqiB4wvWZef7QfeiFbK9RvcqHujWV99bCg4hbMiA00UEDh2MG1GkI391SV6a8Yl1g1GIifHiFoWWvK>
PING phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOFN-0pjb2LLxJHkuxJqS9U2GCQeB1JbsUxPBcucynWM9n6bGKCPnNMvDhG6ht4GMIFqwSkvwe6Si0yjhYagU9sNNLACVXjD8rc9Oq9C5KOBab0UsqM4HgWhIulP7BkuOhInH_pB4pzJjrlpWUBRd_IcCugg8-6q7fGyKW4oaKG6cT5i289e8VlfdUkn4IvgE-uq028C8jlp6WChCR_l6gFPrG1zeqw7ZfHl-3B_wQjQEpsmStR_WIjiT_RngW3l80UHBup72gmtS8KWzE53_X8JZMI8ge2MDWG>
(70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C

Which means (86-71=) 15 millis spent calcing per direction, or 30
milliseconds total for the laptop to be thinking between requests. As it
happens, expt.html is my original view page, which I pointed at
view.html around the middle of last year, so it would be natural to
follow the link, and I'll keep the 30 millis in mind.

Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico
(which is an 'asset' of the view.html page), so that's an extra
(199-172=) 27 millis over the processing time that led to calling for
view.html, likely because view.html is 3x the size of expt.html (you can
see for yourself! :-).

With that in mind, we are ready to answer the burning question: were the
repeat loads timed the same as the initial ones? And when was the
decision made to launch the second round?

Roundtrip times, estimated calc times:
First: 172/30, 199/57

First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133

So it spent 100 millis longer thinking about whether to repeat than it
did to decide to load favicon.ico and not a bunch of other assets that
normally get loaded. Now let's look at the roundtrip times of the next
series, to see if they were all sucked at once, or the same process was
followed.

Roundtrip times, estimated calc times:
Second: 83/<0, 142/0, veddy interesting.

---

Spider-like behavior is also happening with a wipe and install of High
Sierra: below I show one jpeg being fetched without a web page twice,
and no other activity at the time. In this case I haven't installed
Chrome, using Firefox and Safari instead. I'm going to try installing linux.

Last seeming-human activity, involving a series of normal
back-and-forths with multiple loads:

01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false

Then the first odd load without a page:

01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Next seems like me, as above:

01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat
false

Then the apparent robot:

01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg

Eventually likely-me again:

01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST

---

I installed Ubuntu on the Macbook, and with no MacOS around, I just saw
another unexplained load of the same page that was loaded yesterday at
about the same time (01/16 07:46:09.933 and 08:55:01.199 with MacOS,
01/17 07:47:23.052 with Ubuntu).

Simone Bordet

2018-01-19 07:20:50 UTC

Permalink

Hi,

Post by Bill
Does anyone watch the spiders?

[snip]

Have you tried using wireshark to know the client socket being opened
and the lsof/ss/netstat on your machines to understand what process
opened that socket ?

--
Simone Bordet
----
http://cometd.org
http://webtide.com
Developer advice, training, services and support
from the Jetty & CometD experts.

Bill

2018-01-19 08:46:15 UTC

Permalink

I haven't watched the wire for a long time, I guess it'd be possible to
filter for packets going to my website. No action in the log. Ideally a
monitor process would filter and check the process id on the spot.
Taking a look at wireshark.

Post by Simone Bordet
Hi,

Post by Bill
Does anyone watch the spiders?

[snip]
Have you tried using wireshark to know the client socket being opened
and the lsof/ss/netstat on your machines to understand what process
opened that socket ?

Bill

2018-01-19 10:47:55 UTC

Permalink

It looks like wireshark doesn't display the pid, but it will give useful
info if I see another probe in the server log.

Thanks!

Post by Bill
I haven't watched the wire for a long time, I guess it'd be possible
to filter for packets going to my website. No action in the log.
Ideally a monitor process would filter and check the process id on the
spot. Taking a look at wireshark.

Post by Simone Bordet
Hi,

Post by Bill
Does anyone watch the spiders?

[snip]
Have you tried using wireshark to know the client socket being opened
and the lsof/ss/netstat on your machines to understand what process
opened that socket ?

_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users

Bill Ross

2018-02-13 00:39:38 UTC

Permalink

My default explanation now is that somehow jetty or ubuntu supplied the
wrong IP's for logging.

Bill

Post by Bill
Does anyone watch the spiders? I actually feed them by using
interesting math to derive file creation times that I publish,
unrelated to the actual file, in order to watch how they probe. So it
was a big surprise to see spider-like behavior coming from behind my
home router. The remaining possibilities areÂ my backend Ubuntu GPU
deep learning box, and my router, assuming it unlikely that an
intruder on my router (none shown connected now) would go after my
website in slow motion to the tune of ~10 requests over ~4 days. The
innocuous explanation I can come up with is that some sort of buggy
spontaneous cache refresh is involved, but two different patterns have
been seen, so I remain hooked on the problem.
Here I analyze the timing and ridiculousness of the requests in the
jetty log, a pleasant distraction from about 5 OS installs over the
last 48 hours or so. Who knows what my FB friends thought of these
posts. :-)
Any ideas about the cause would be welcome. Odd to see a stranger in
the mirror.
Bill
---
I looked at my website's log and noticed that my laptop [more
accurately, my IP address] had downloaded 3 files twice within a
second, as if it was a google bot probing my website, something that
would take me some effort to do myself that way; I know it couldn't
have been me for a few reasons elucidated below. I had recently
installed Apple's new version of the El Capitan operating system (for
the CPU exploits that have been in the news), and also [thought I
might have picked up a virus on FB]. I called Apple, but their help
people could not understand the concept of detecting a virus in a web
site log, or the fact it could mean that there is a virus in the
latest bugfix they just pushed.
So I reported it to the authorities (2018-USCERTv33LDPI), wiped my
laptop and upgraded to the next operating system, High Sierra, then
changed all my passwords.
If I hadn't had access to my own website's logs and recognized my IP
address, things would seem fine, but maybe all my money would
disappear at some point. Obviously I can't hope to pick up stuff like
this in any predictable way, so the only answer I can think of is to
stay poor and change passwords regularly.
If I wrote something nefarious that did what I saw, it would be part
of an attack on my website (likely while rifling the laptop) as a test
probe to analyze the timestamps on multiple copies of the files, which
as it happens I have come up with some entertaining math to derive -
since no one else is interested in Phobrain, I provide intellectual
food for the spiders, at least.
Here are the time stamps for the requests from my address, from the
log. I think no one should graduate high school these days without
being able to spot that these all happen in < 1 second and make no
sense for a browser to do.
1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico
Thinking further: note the deltas, e..g. the 2 html pages load within
(982-810=) 172 milliseconds, implying < 86 millis each way. Using a
utility called 'ping' I see the raw net time between laptop and site
$ ping phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOwOlpDppqR_fGpiJu0tipKD4iqiPs9FonyP4xGCIGm5D2-g48xpAUcvEV3unXSJAf7RTgg8Cw7hp05FRFLXNr3o34_PT5GofN8l466Ao4lzOobBJ5lYS5w4PU3rTEpN5LBZBtf4V1C66blDGh3dbqUQXRbEvBUfCgmz0kH1qw1KtURWmHxfRfonrGTtbg-8LVAAWbjUBt8Zv9_jDFmFZAqiB4wvWZef7QfeiFbK9RvcqHujWV99bCg4hbMiA00UEDh2MG1GkI391SV6a8Yl1g1GIifHiFoWWvK>
PING phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOFN-0pjb2LLxJHkuxJqS9U2GCQeB1JbsUxPBcucynWM9n6bGKCPnNMvDhG6ht4GMIFqwSkvwe6Si0yjhYagU9sNNLACVXjD8rc9Oq9C5KOBab0UsqM4HgWhIulP7BkuOhInH_pB4pzJjrlpWUBRd_IcCugg8-6q7fGyKW4oaKG6cT5i289e8VlfdUkn4IvgE-uq028C8jlp6WChCR_l6gFPrG1zeqw7ZfHl-3B_wQjQEpsmStR_WIjiT_RngW3l80UHBup72gmtS8KWzE53_X8JZMI8ge2MDWG>
(70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C
Which means (86-71=) 15 millis spent calcing per direction, or 30
milliseconds total for the laptop to be thinking between requests. As
it happens, expt.html is my original view page, which I pointed at
view.html around the middle of last year, so it would be natural to
follow the link, and I'll keep the 30 millis in mind.
Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico
(which is an 'asset' of the view.html page), so that's an extra
(199-172=) 27 millis over the processing time that led to calling for
view.html, likely because view.html is 3x the size of expt.html (you
can see for yourself! :-).
With that in mind, we are ready to answer the burning question: were
the repeat loads timed the same as the initial ones? And when was the
decision made to launch the second round?
First: 172/30, 199/57
First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133
So it spent 100 millis longer thinking about whether to repeat than it
did to decide to load favicon.ico and not a bunch of other assets that
normally get loaded. Now let's look at the roundtrip times of the next
series, to see if they were all sucked at once, or the same process
was followed.
Second: 83/<0, 142/0, veddy interesting.
---
Spider-like behavior is also happening with a wipe and install of High
Sierra: below I show one jpeg being fetched without a web page twice,
and no other activity at the time. In this case I haven't installed
Chrome, using Firefox and Safari instead. I'm going to try installing linux.
Last seeming-human activity, involving a series of normal
01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false
01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat
false
01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST
---
I installed Ubuntu on the Macbook, and with no MacOS around, I just
saw another unexplained load of the same page that was loaded
yesterday at about the same time (01/16 07:46:09.933 and 08:55:01.199
with MacOS, 01/17 07:47:23.052 with Ubuntu).
_______________________________________________
jetty-users mailing list
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users