Bill
2018-01-18 09:45:22 UTC
Does anyone watch the spiders? I actually feed them by using interesting
math to derive file creation times that I publish, unrelated to the
actual file, in order to watch how they probe. So it was a big surprise
to see spider-like behavior coming from behind my home router. The
remaining possibilities are my backend Ubuntu GPU deep learning box,
and my router, assuming it unlikely that an intruder on my router (none
shown connected now) would go after my website in slow motion to the
tune of ~10 requests over ~4 days. The innocuous explanation I can come
up with is that some sort of buggy spontaneous cache refresh is
involved, but two different patterns have been seen, so I remain hooked
on the problem.
Here I analyze the timing and ridiculousness of the requests in the
jetty log, a pleasant distraction from about 5 OS installs over the last
48 hours or so. Who knows what my FB friends thought of these posts. :-)
Any ideas about the cause would be welcome. Odd to see a stranger in the
mirror.
Bill
---
I looked at my website's log and noticed that my laptop [more
accurately, my IP address] had downloaded 3 files twice within a second,
as if it was a google bot probing my website, something that would take
me some effort to do myself that way; I know it couldn't have been me
for a few reasons elucidated below. I had recently installed Apple's new
version of the El Capitan operating system (for the CPU exploits that
have been in the news), and also [thought I might have picked up a virus
on FB]. I called Apple, but their help people could not understand the
concept of detecting a virus in a web site log, or the fact it could
mean that there is a virus in the latest bugfix they just pushed.
So I reported it to the authorities (2018-USCERTv33LDPI), wiped my
laptop and upgraded to the next operating system, High Sierra, then
changed all my passwords.
If I hadn't had access to my own website's logs and recognized my IP
address, things would seem fine, but maybe all my money would disappear
at some point. Obviously I can't hope to pick up stuff like this in any
predictable way, so the only answer I can think of is to stay poor and
change passwords regularly.
If I wrote something nefarious that did what I saw, it would be part of
an attack on my website (likely while rifling the laptop) as a test
probe to analyze the timestamps on multiple copies of the files, which
as it happens I have come up with some entertaining math to derive -
since no one else is interested in Phobrain, I provide intellectual food
for the spiders, at least.
Here are the time stamps for the requests from my address, from the log.
I think no one should graduate high school these days without being able
to spot that these all happen in < 1 second and make no sense for a
browser to do.
1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico
Thinking further: note the deltas, e..g. the 2 html pages load within
(982-810=) 172 milliseconds, implying < 86 millis each way. Using a
utility called 'ping' I see the raw net time between laptop and site
right now:
$ ping phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOwOlpDppqR_fGpiJu0tipKD4iqiPs9FonyP4xGCIGm5D2-g48xpAUcvEV3unXSJAf7RTgg8Cw7hp05FRFLXNr3o34_PT5GofN8l466Ao4lzOobBJ5lYS5w4PU3rTEpN5LBZBtf4V1C66blDGh3dbqUQXRbEvBUfCgmz0kH1qw1KtURWmHxfRfonrGTtbg-8LVAAWbjUBt8Zv9_jDFmFZAqiB4wvWZef7QfeiFbK9RvcqHujWV99bCg4hbMiA00UEDh2MG1GkI391SV6a8Yl1g1GIifHiFoWWvK>
PING phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOFN-0pjb2LLxJHkuxJqS9U2GCQeB1JbsUxPBcucynWM9n6bGKCPnNMvDhG6ht4GMIFqwSkvwe6Si0yjhYagU9sNNLACVXjD8rc9Oq9C5KOBab0UsqM4HgWhIulP7BkuOhInH_pB4pzJjrlpWUBRd_IcCugg8-6q7fGyKW4oaKG6cT5i289e8VlfdUkn4IvgE-uq028C8jlp6WChCR_l6gFPrG1zeqw7ZfHl-3B_wQjQEpsmStR_WIjiT_RngW3l80UHBup72gmtS8KWzE53_X8JZMI8ge2MDWG>
(70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C
Which means (86-71=) 15 millis spent calcing per direction, or 30
milliseconds total for the laptop to be thinking between requests. As it
happens, expt.html is my original view page, which I pointed at
view.html around the middle of last year, so it would be natural to
follow the link, and I'll keep the 30 millis in mind.
Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico
(which is an 'asset' of the view.html page), so that's an extra
(199-172=) 27 millis over the processing time that led to calling for
view.html, likely because view.html is 3x the size of expt.html (you can
see for yourself! :-).
With that in mind, we are ready to answer the burning question: were the
repeat loads timed the same as the initial ones? And when was the
decision made to launch the second round?
Roundtrip times, estimated calc times:
First: 172/30, 199/57
First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133
So it spent 100 millis longer thinking about whether to repeat than it
did to decide to load favicon.ico and not a bunch of other assets that
normally get loaded. Now let's look at the roundtrip times of the next
series, to see if they were all sucked at once, or the same process was
followed.
Roundtrip times, estimated calc times:
Second: 83/<0, 142/0, veddy interesting.
---
Spider-like behavior is also happening with a wipe and install of High
Sierra: below I show one jpeg being fetched without a web page twice,
and no other activity at the time. In this case I haven't installed
Chrome, using Firefox and Safari instead. I'm going to try installing linux.
Last seeming-human activity, involving a series of normal
back-and-forths with multiple loads:
01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false
Then the first odd load without a page:
01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
Next seems like me, as above:
01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat
false
Then the apparent robot:
01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
Eventually likely-me again:
01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST
---
I installed Ubuntu on the Macbook, and with no MacOS around, I just saw
another unexplained load of the same page that was loaded yesterday at
about the same time (01/16 07:46:09.933 and 08:55:01.199 with MacOS,
01/17 07:47:23.052 with Ubuntu).
math to derive file creation times that I publish, unrelated to the
actual file, in order to watch how they probe. So it was a big surprise
to see spider-like behavior coming from behind my home router. The
remaining possibilities are my backend Ubuntu GPU deep learning box,
and my router, assuming it unlikely that an intruder on my router (none
shown connected now) would go after my website in slow motion to the
tune of ~10 requests over ~4 days. The innocuous explanation I can come
up with is that some sort of buggy spontaneous cache refresh is
involved, but two different patterns have been seen, so I remain hooked
on the problem.
Here I analyze the timing and ridiculousness of the requests in the
jetty log, a pleasant distraction from about 5 OS installs over the last
48 hours or so. Who knows what my FB friends thought of these posts. :-)
Any ideas about the cause would be welcome. Odd to see a stranger in the
mirror.
Bill
---
I looked at my website's log and noticed that my laptop [more
accurately, my IP address] had downloaded 3 files twice within a second,
as if it was a google bot probing my website, something that would take
me some effort to do myself that way; I know it couldn't have been me
for a few reasons elucidated below. I had recently installed Apple's new
version of the El Capitan operating system (for the CPU exploits that
have been in the news), and also [thought I might have picked up a virus
on FB]. I called Apple, but their help people could not understand the
concept of detecting a virus in a web site log, or the fact it could
mean that there is a virus in the latest bugfix they just pushed.
So I reported it to the authorities (2018-USCERTv33LDPI), wiped my
laptop and upgraded to the next operating system, High Sierra, then
changed all my passwords.
If I hadn't had access to my own website's logs and recognized my IP
address, things would seem fine, but maybe all my money would disappear
at some point. Obviously I can't hope to pick up stuff like this in any
predictable way, so the only answer I can think of is to stay poor and
change passwords regularly.
If I wrote something nefarious that did what I saw, it would be part of
an attack on my website (likely while rifling the laptop) as a test
probe to analyze the timestamps on multiple copies of the files, which
as it happens I have come up with some entertaining math to derive -
since no one else is interested in Phobrain, I provide intellectual food
for the spiders, at least.
Here are the time stamps for the requests from my address, from the log.
I think no one should graduate high school these days without being able
to spot that these all happen in < 1 second and make no sense for a
browser to do.
1/13 06:20:49.810 INFO - Mapping expt.html
01/13 06:20:49.982 INFO - Mapping view.html
01/13 06:20:50.181 INFO - Mapping favicon.ico
01/13 06:20:50.257 INFO - Mapping expt.html
01/13 06:20:50.340 INFO - Mapping view.html
01/13 06:20:50.482 INFO - Mapping favicon.ico
Thinking further: note the deltas, e..g. the 2 html pages load within
(982-810=) 172 milliseconds, implying < 86 millis each way. Using a
utility called 'ping' I see the raw net time between laptop and site
right now:
$ ping phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOwOlpDppqR_fGpiJu0tipKD4iqiPs9FonyP4xGCIGm5D2-g48xpAUcvEV3unXSJAf7RTgg8Cw7hp05FRFLXNr3o34_PT5GofN8l466Ao4lzOobBJ5lYS5w4PU3rTEpN5LBZBtf4V1C66blDGh3dbqUQXRbEvBUfCgmz0kH1qw1KtURWmHxfRfonrGTtbg-8LVAAWbjUBt8Zv9_jDFmFZAqiB4wvWZef7QfeiFbK9RvcqHujWV99bCg4hbMiA00UEDh2MG1GkI391SV6a8Yl1g1GIifHiFoWWvK>
PING phobrain.com
<https://l.facebook.com/l.php?u=http%3A%2F%2Fphobrain.com%2F&h=ATOFN-0pjb2LLxJHkuxJqS9U2GCQeB1JbsUxPBcucynWM9n6bGKCPnNMvDhG6ht4GMIFqwSkvwe6Si0yjhYagU9sNNLACVXjD8rc9Oq9C5KOBab0UsqM4HgWhIulP7BkuOhInH_pB4pzJjrlpWUBRd_IcCugg8-6q7fGyKW4oaKG6cT5i289e8VlfdUkn4IvgE-uq028C8jlp6WChCR_l6gFPrG1zeqw7ZfHl-3B_wQjQEpsmStR_WIjiT_RngW3l80UHBup72gmtS8KWzE53_X8JZMI8ge2MDWG>
(70.32.90.126) 56(84) bytes of data.
64 bytes from 70.32.90.126: icmp_seq=1 ttl=52 time=71.1 ms
64 bytes from 70.32.90.126: icmp_seq=2 ttl=52 time=71.3 ms
64 bytes from 70.32.90.126: icmp_seq=3 ttl=52 time=71.0 ms
^C
Which means (86-71=) 15 millis spent calcing per direction, or 30
milliseconds total for the laptop to be thinking between requests. As it
happens, expt.html is my original view page, which I pointed at
view.html around the middle of last year, so it would be natural to
follow the link, and I'll keep the 30 millis in mind.
Then we have (50.181-49.982=) 199 millis from view.html to favicon.ico
(which is an 'asset' of the view.html page), so that's an extra
(199-172=) 27 millis over the processing time that led to calling for
view.html, likely because view.html is 3x the size of expt.html (you can
see for yourself! :-).
With that in mind, we are ready to answer the burning question: were the
repeat loads timed the same as the initial ones? And when was the
decision made to launch the second round?
Roundtrip times, estimated calc times:
First: 172/30, 199/57
First favicon.ico to Second: 76/<0: Second must have been started before
favicon.ico was received, but
First view.html to Second expt.html: (50.257-49.982=) 275/133
So it spent 100 millis longer thinking about whether to repeat than it
did to decide to load favicon.ico and not a bunch of other assets that
normally get loaded. Now let's look at the roundtrip times of the next
series, to see if they were all sucked at once, or the same process was
followed.
Roundtrip times, estimated calc times:
Second: 83/<0, 142/0, veddy interesting.
---
Spider-like behavior is also happening with a wipe and install of High
Sierra: below I show one jpeg being fetched without a web page twice,
and no other activity at the time. In this case I haven't installed
Chrome, using Firefox and Safari instead. I'm going to try installing linux.
Last seeming-human activity, involving a series of normal
back-and-forths with multiple loads:
01/16 07:18:15.653 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat false
Then the first odd load without a page:
01/16 07:46:09.933 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
Next seems like me, as above:
01/16 08:24:40.393 INFO com.priot.servlet.GetMult - GetMult POST
...
01/16 08:25:26.891 INFO com.priot.servlet.GetMult - REQ 0/v r 0 repeat
false
Then the apparent robot:
01/16 08:55:01.199 INFO c.p.s.FileSystemResourceServlet - Mapping rodin.jpg
Eventually likely-me again:
01/16 17:36:22.601 INFO com.priot.servlet.GetMult - GetMult POST
---
I installed Ubuntu on the Macbook, and with no MacOS around, I just saw
another unexplained load of the same page that was loaded yesterday at
about the same time (01/16 07:46:09.933 and 08:55:01.199 with MacOS,
01/17 07:47:23.052 with Ubuntu).