Ok, the options for loading the page in are all defined in the manual.
For some reason, regexes and I don't mix - but I try to get along with them.
That said, grabbing my "Mastering Regular Expressions" manual, this may work:
preg_replace("/<a\s+href=['"]?([\s]*)['"]?[\s>]+/g",$new_url,$htmlfile);
I'm used to regex gurus jumping on my suggestions and bitchslapping me, but it should work. Here's what it does, for your future regex knowledge:
preg_replace("/<a\s+href=
there should be one space between '<a' and 'href=' - but there may be more; this tells regex to look for one or more spaces between <a and href
['"]? There may or may not be a ' or a " to start the actual url.
([\S]*) As it's possible that a link could look like <a href=> it matches 0 or more non-space characters.. until it hits ..
['"]? Once again, optional quotes
[\s>]+ This should close off the end of the URL, covering most eventualities .. it looks for a space or a > which covers:
<a href=blah>
and
<a href=blah onClick="plop();">
It's a fairly simple procedure as far as regular expressions go - a basic regex tutorial, or even the one in the manual, should give you the basic knowledge.
Hope this helps, and thanks you for allowing me to find another excuse not to do work 🙂