Hey everybody!
I spend a fair amount of time looking through my apache logs and always see these entries that get me started thinking about security again:
68.183.193.242 - - [20/Mar/2024:10:59:08 +0000] "GET /Temporary_Listen_Addresses HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:08 +0000] "GET /ews/exchanges/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:08 +0000] "GET /ews/exchange%20/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /ews/exchange/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /ews/%20/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /ews/ews/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /ews/autodiscovers/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /autodiscover/autodiscovers/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /autodiscover/autodiscover%20/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /autodiscover/autodiscoverrs/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
68.183.193.242 - - [20/Mar/2024:10:59:09 +0000] "GET /autodiscove/ HTTP/1.1" 404 1625 "-" "Mozilla/5.0 zgrab/0.x"
135.125.244.48 - - [20/Mar/2024:11:16:16 +0000] "GET /.env HTTP/1.1" 404 397 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
135.125.244.48 - - [20/Mar/2024:11:16:16 +0000] "POST / HTTP/1.1" 404 397 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
194.38.23.16 - - [20/Mar/2024:14:12:31 +0000] "GET /sites/all/modules/civicrm/packages/OpenFlashChart/php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 1494 "-" "ALittle Client"
194.38.23.16 - - [20/Mar/2024:14:12:31 +0000] "GET /php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 1494 "-" "ALittle Client"
194.38.23.16 - - [20/Mar/2024:14:12:32 +0000] "GET /sites/default/modules/civicrm/packages/OpenFlashChart/php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 1494 "-" "ALittle Client"
35.94.93.42 - - [20/Mar/2024:15:38:50 +0000] "GET /.git/HEAD HTTP/1.1" 404 360 "-" "Python-urllib/3.10"
161.97.147.235 - - [20/Mar/2024:16:12:08 +0000] "GET /wp-login.php HTTP/1.1" 404 3439 "http://www.example.com/wp-login.php" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/95.0"
174.53.49.200 - - [20/Mar/2024:16:38:24 +0000] "-" 408 0 "-" "-"
185.254.196.173 - - [20/Mar/2024:18:42:13 +0000] "GET /.env HTTP/1.1" 404 397 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
135.125.244.48 - - [20/Mar/2024:19:03:50 +0000] "POST / HTTP/1.1" 404 397 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
172.104.11.4 - - [20/Mar/2024:19:47:51 +0000] "\x16\x03\x01" 400 392 "-" "-"
Some of these are trying to sniff out secret credential files (.env), some are looking for git repos, some are refusing to ask for any files (408 response), some are seeking wordpress entry points (wp-login.php), some are looking for particular modules that probably contain exploits, some appear to be attempting binary chars (\x16\x03\x01, etc).
It's of little comfort that the http responses seem correct, but this bothers me. It seems like such requests might be munged to generate some kind of threat map and that, in turn, might help one better protect one's systems. I know that fail2ban has various jails but I tried their 404 jail once (i believe this is the apache-noscript jail) but this caused serious problems. Missing images or CSS files, perfectly innocent ones, generated a lot of 404 requests and this commonly banned an IP address behind which there were a LOT of people (e.g., the University of Michigan or some busy Starbucks somewhere) which was NOT GOOD.
So I have two questions:
1) supposing I were to cook up some kind of machine learning script that, given an apache log full of novel requests, could classify each requests as good (apparently a good faith request) or bad (a hack attempt or script kiddie screwing around), would folks find this helpful? Can we think of any data we might want to extract with such a script (e.g., "guiltiest IP addresses" or "fishiest user agent strings") or any sort of application for such a machine learning tool?
2) What tricks are folks using these days to keep the script kiddies and scrapers out without excluding friendly users and friendly web crawlers? I've seen some sites lately that appear to perform some kind of cookie or captcha check to even let you browse the site.
I've been thinking I might work up some code in my PHP app that sets a cookie when it detects suspicious behavior and then bans anyone showing up with that cookie. Of course, most bad behavior probably doesn't bother with cookies, so I started thinking I might set a cookie for any fresh visitor and if they failed to present such a cookie on subsequent requests, I might just show them an error page or something. Neither of these solutions seems very good to me.