I'm trying to preg_split an HTML text string $text_thats_too_long on the spaces in order to shorten it to a predefined $word_limit, then recombine the split array using implode(). The goal is to truncate the text, but only on word boundaries. 'Til now I had been using something like this:
$truncated_text = implode(' ', array_slice(preg_split('/\s+/', $text_thats_too_long), 0, $word_limit)).'...'
The problem with this is that if $text_thats_too_long contains a hyperlink anchor tag (or any HTML tag with a space in it), the preg_split will operate on the space inside the tag, which is undesirable as it leaves poorly formed HTML: unclosed tags may result.
Is there some other regular expression I can use to ensure I won't be preg_splitting on spaces within tags, or is this impossible to accomplish with a regular expression?