So I'm making a program that copies some other Web page to my server and I want the copied page to show up exactly like the original. Well, that's a problem if the page I'm copying has relative path locations in its source code because it will trying to be finding images on MY server rather than the where it's actually hosted.
So what I do is replace occurences of "../" or "../../" or "./" etc with the appropriate truncated version of the user-given URL.
It's easy to find "../"'s and the like, but it's proven difficult to find relative locations of this form:
<a href="somedirectory/something.jpg">
which is, of course just a "./". My regular expression has gone from ghastly to horrific because I want it to account for rogue (but legal) spaces like this:
<a href=" somedirectory/something.jpg ">
I've gotten to that point where my regular expression is so complicated I wonder if I even need half of it. Hopefully, someone can give me an alternative to regexp, but if not, please suggest an expression or pick apart this:
"=[ ]*(["\']?)[ ]*([^ /\.]+\.[^ ]{2,})[ ]*(["\'>]|[ ]+[a-zA-Z])"
Some of this is complicated because I'm trying to skip instances of "Javascript1.1" that show up from time to time in source code. Feel free to tear this apart.