Let's start simple and gradually work our way up to something quite complex and complete.
A first try might give us this
$text = preg_replace( '/<img src="(.*)"\/>/', '<img src="phpThumb?src=\1"/>', $text );
There are a few problems with this solution,
1. It can't cope with the image tag having any other attributes and we all know our image tags should always have an alt tag.
2. It can't handle more than one image tag in any one piece of text. This is due to greedy matching in regular expressions.
3. It can't handle strangely captitalised tag and attribute names
4. It can't handle single quote escaped src attributes
5. It breaks when dealing with dynamically generated images because it doesn't URL encode the source.
Let's deal with each problem in turn, first number 1.
We need to match around the src attribute and put them in the replacement so that they are preserved. We can do this with a couple of extra, well placed subpatterns. Let's look at a naive solution.
$text = preg_replace( '/<img(.*)src="(.*)"(.*)>/', '<img\1src="phpThumb?src=\2"\3>', $text );
This is a naive solution because it falls foul of problem 2. PCRE expressions default to greedy matching, info on this can be found in the php manual but basically it means that the engine will try to match as much as it possibly can.
In the previous example each of the .* clauses will try to match as much as they possibly can so given the text below the first .* will match all the text coloured red.
A quick example of <img[color=#f00] src="/images/image1.jpg" /> and then <img [/color]src="images/image2.jpg" />
One solution to this is to explicitely restrict how far it matches by defining a known character or string it cannot match, like below
$text = preg_replace( '/<img([^(src)]*)src="([^"]*)"([^>]*)>/', '<img\1src="phpThumb?src=\2"\3>', $text );
Although, in my view, this is a little hard to read (and can actually fail, but I won't go into that). Luckily there is another solution, we can use un-greedy matching. There are two ways we can do this, we can either add a question mark (?) after each quantifier () which will reverse the "greedyness" of the clause or, if we want all our clauses to be un-greedy, we can change the default greedyness for the entire expression with the U modifier. We want all our clauses to be un-greedy so we can safely use the U modifier as below.
$text = preg_replace( '/<img(.*)src="(.*)"(.*)>/U', '<img\1src="phpThumb?src=\2"\3>', $text );
Right, on to problem 3, strangely capitalised tag and attribute names. At the moment our expression will not match with the following text.
Here's an example <IMG ALT="some alt text" SRC="/images/image1.jpg" /> some more text
The reason it doesn't match is because it only matches lower case text. This problem can be easily solved with another modifier, the caseless modifier as below.
$text = preg_replace( '/<img(.*)src="(.*)"(.*)>/Ui', '<img\1src="phpThumb?src=\2"\3>', $text );
Not far to go now, we're on to problem 4. At the moment, our pattern won't match if the value for the src attribute is surrounded with single quoted instead of double ones. This can be solved with a quick character class and a back reference as below.
$text = preg_replace( '/<img(.*)src=([\'"])(.*)\2(.*)>/Ui', '<img\1src=\2phpThumb?src=\3\2\4>', $text );
Note that we have to renumber the backreferences in the replacement as we now have an extra match.
I have saved the most invasive solution to the end so that you can see how regular expressions work in a normal situation first, but now we have to have a look at it. At the moment this solution won't handle dynamic urls properly. Take the following example.
Here's an example <img src="dynamicImage.php?param1=yea¶m2=blah" />
After our replace runs we will have the following.
Here's an example <img src="phpThumb?src=dynamicImage.php?param1=yea¶m2=blah" />
This will pass two get parameters to the phpThumb page. src with the value "dynamicImage.php?param1=yea" and param2 with the value "blah". This is obviously not correct as the param2 parameter should be sent to the dynamicImage.php page to help generate the image. In order to fix this bug we need to perform a [man]urlencode [/man] on the src value. This cannot be done in place with preg_replace, so we have to use another function, [man]preg_replace_callback[/man] and a user written function. preg_replace_callback calls a user defined function when a match is found rather than doing the replacement in place. Here's how it works.
function replaceImageTag( $matches )
{
return '<img' . $matches[1] . 'src=' . $matches[2] . 'phpThumb?src=' . urlencode( $matches[3] ) . $matches[2] . $matches[4] . '>';
}
$text = preg_replace_callback( '/<img(.*)src=([\'"])(.*)\2(.*)>/Ui', 'replaceImageTag', $text );
Hope this helps.
Rob