Originally posted by virva
Hmm... I tried it, but it doesn't work, at least not like this:
Okay, I have a bit more information now, and can give a simpler answer. Since only one anchor entity can have a given name (and I'm guessing that anchor Työtehtävät doesn't have any other attributes), the pattern can be simplified:
'#<a name="työtehtävät">(.*)<a name="kelpoisuusehdot">#is'
Note that this assumes that the two tags are exactly <a name="työtehtävät"> and <a name="kelpoisuusehdot"> (apart from maybe using upper case letters). If either tag has other attributes following the name, then putting a [>]* before the corresponding > will take care of them.
Be aware also that it doesn't keep the <a> tags themselves. If you do want those as well, move the () outwards to enclose them, also (just keep them inside the ##).
One last thing - this will only match if the entire thing you're wanting to match (tags included) is read as part of a single 4096-byte chunk (because of the way you're reading the page in). Instead of testing each chunk separately, concatenate them as you read them, and then test the concatenated string; with each chunk you test the entire page you've read so far. If the pattern is in the page, then eventually it will be in the string (even if it spreads across more than one chunk). To do that it's just a matter of replacing $html = with $html .=. Oh, and replacing fgets() with fread() might speed things up a bit, 'cos it won't keep stopping on every newline character.