i have an 'email a friend' page on my site.
the idea is that if a user finds an interesting page, i want to enable them to have an email sent to a friend. i have a link like this:
<a href="javascript:;" onClick="MM_openBrWindow('email-a-friend.php?page_url=<?= urlencode($_SERVER['REQUEST_URI']) ?>&page_title=' + escape(document.title),'','scrollbars=yes,width=450,height=400')">EMAIL A FRIEND</a>
thus, i am passing the current REQUEST_URI via $_GET['page_url'] to the script. this means that any schmoe might be able to automate POSTS to my page with their own evil URL that has nothing to do with my site.
THUS, i would like to scrub anything i find in $_GET['page_url'] to make ABSOLUTELY CERTAIN that my 'email a friend feature' will not send any emails that link to someone else's site. I would like to create an effective regular expression to check for all the common spam hacks (and possibly the uncommon ones too).
I have noticed in spam that i have received that the <a> tags often have line breaks, multiple instances of 'http' or 'mailto' in the href attribute, etc. here's one example:
<A href="h
ttp:\/kdsqbnqkfhe.org%2Ekdukjcmiqmve%2Eulle
ladt.info#rkyerumlzuvu.net">
<FONT SIZE=1></FONT><IMG SRC="cid:0f6801c5259f$55ba1aa0$37bfd2ac@gjm.com" border="0" ALT=""></A>
Notice the line breaks. there's even one after 'h' in 'http'. also notice the urlencoding and the '#' characters. If anyone could help me to understand how the http protocol gets parsed, i would appreciate it.
ALSO:
what's up with the images like this :
<img src="cid:tgyhbjir_hbqmtfqw_sqbaqfsj">
how do these work???