Hey all
So I have a script that crawls certain websites, brings the content and strips out, among other things, links tags entirly.
I have a preg replace with
'@<a[^>]*?>.*?</a>@si',
That does the job. Everyone were happy for some time, till some pages did problems. Eventually I found out those pages have something like
<a href="whatever">whatever</a></a>
This pretty much freaks the rest of the script and things gets striped out randomly.
I have a pretty weak control in regexes, I tried to change the regex so it handles this kind of situation, but so far to no avail.
Any help would be greatly appreciated!!!