OK, let me start by throwing my hands in the air and admitting that regular expressions have always confused the pants off me. No suprise then that I have got myself totally stuck on a problem that I am sure is quite simple.
Basically I am looking for an expression to find all links pointing to a certain part of a particular site within URLs that I am crawling. I want to return the full link url & the text that is being used to link.
For example:
<a href="http://mysite.com/mydir/xxx.xx">YYYYY</a>
I'm not bothered about the links containing optional target tags, or image links, as I am already stripping these.
It should though work no matter how deep under mydir the link appears, and whether quotes are used around the URL or not.
If someone can help me with this, and maybe post a link to a buffoons guide to regex then it would be appreciated!
Thanks in advance.