My script has a huge string (raw HTML file contents) that it needs to parse. It needs to go through the string and pull out all URLs that are listed as links. For instance, if it sees <A href="bob.html"> it will pull out bob.html as a result.
The pattern that I'm using for my eregi() call is this: (href=\"?)(.*)([\" >])
Sometimes the quotes will be omitted from the href tag, so the following is also a possible syntax: <A href=bob.html> or even <A Href=bob.html onClick=""> In any case, I need it to pull out exactly: bob.html
What's wrong with the pattern I'm using? Did I express SPACE (ascii: 32) correctly or does it need to be escaped in some way?
Thanks!