Forgive me, I initially didn't realize that there were responses to this thread because the forum has various bugs and isn't updating response counts, etc.
schwim I'm confused, where's your 4,000 wp-* requests per hour?
Oh this is just a small extraction from a server that is not very busy. I've seen those script kiddies sounding out my server for a wp install, and boy does it rankle me. I've long contemplated setting up some page to handle those requests, but I have not done so because I'm not sure what measures I might take against the script kiddies and, if I'm just collecting information, I'm not sure what I would even do with the collected info. Perhaps cram it into some table for later assessment?
deny from 20.172.38.178
dalecosp I'd actually feel better if it modified firewall rules, but you run a high risk of screwing things up royally as a fat-fingered URL in your browser might lock you out pretty hard....
Thanks for this, @dalecosp. Are you not concerned that blocking an entire IP address might lock out well-intentioned visitors. E.g., what if that IP represents a busy coffee shop wifi network? Or a large corporate office? Or the entire University of Michigan? I once tried enabling the apache-noscript
jail in fail2ban, and this had the very bad side effect of blocking nearly all my visitors because my markup (which was admittedly very ugly due to a sloppy front end guy) had all kinds of bad links to transparent GIFs. This is back in the Bad Old Days when CSS didn't really work and IE was in widespread use.
dalecosp It also bans UAs that don't provide a UA string and some clients by default ("go-http-client" comes to mind...)
I've seen those script kiddy UAs, which also rankle me. I worry about damaging my SEO reputation if I accidentally block search engines or bots. Is an empty UA inherently bad? Would privacy settings or some popular search engine every show up with an empty UA? Or might a search engine (or social preview bot) show up with some unexpected curl UA or some default UA set by a code library? What criteria are we using to distinguish good UAs from bad UAs?
schwim I don't add to the htaccess but my sites all run through index.php and the very first thing I do on page load is check for their existence in the ban table, stopping page load with a simple "banned" before spending any overhead on the visitor. The second thing I do is check URL string, UA, IP, POST/GET data, referer, etc for honeypot stuff, banning them if a match is found.
Low effort ban results(like a script trying all the popular WP exploits) get a one hour ban while more serious efforts get 30 days.
This sounds like what I am most likely to do, although my framework or mod_rewrite might need some mods or massage to be able to handle requests that don't match any of my defined endpoints or routes. I've considered adding honeypot endpoints/routes to sniff out the common forms of snooping, exact actions TBD. I would again ask the question are you not concerned about blocking by ip address? How do we feel about setting some kind of cookie that identifies bad actors like a big red "dunce" cap? Let's call that approach DunceCookie. I guess most bad requests skip cookies entirely, so DunceCookie doesn't sound like it would help with the vast majority of script kiddy activity. Alternatively, perhaps we set a cookie on the first page request and, if that cookie is missing on subsequent requests, we might refuse to do anything useful for a visitor. Let's call this FriendCookie. If someone visits and there's no FriendCookie, we could show them a WELCOME/CONTINUE message and attempt to set the FriendCookie. For any subsequent visits, if they have the FriendCookie, then we could serve them and feel better about them. If they screw around, we could convert FriendCookie to DunceCookie. Any cookieless request would show the welcome page and nothing else.
dalecosp ...turned me off. Index.php there was like a huge SWITCH statement with every possible value...
Routing everything through index.php is quite powerful and you don't need a giant switch statement (that does sound ugly). You can use some algorithmic mapping of urls onto PHP scripts, possibly involving a routing table. Laravel and CodeIgniter use this approach, and I've found it very useful for code organization and for establishing consistent and handy application state to handle every request with a minimum of worry and effort.
dalecosp I did have a "bad_bots.php" that we included in the head of every page at my last major PHP job; it analyzed the UA string and current system load and then, if a bot, did a rand() to decide if they saw content or an HTTP/429 or /503 header. IIRC, Google never was penalized, Bing was allowed 94%-96% chance of a 200 response, and the rest were given lower chances of success.
This detail is helpful. I'd be curious how we go about recognizing bad UAs (script kiddies!) versus good UAs (google, bing, duck duck go, other?, actual users).