Problem:
backreference is not working when I want to extract text enclosed in either single or double quotes
$img = "<IMG SRC=\"photos\123.jpg\"\r\n ALT=\"Dad's cat\">";
// wanted, don't work
preg_match ( "/ALT\s=\s(['\"])(.?)\2\s/is", $img, $alt);
echo 'pattern1:'.$alt[2] ."\n";
// cheesy substitute does not always do the job
preg_match ( "/ALT\s=\s'\"['\"]\s*/is", $img, $alt);
echo 'pattern2:'.$alt[1] ."\n";
pattern 1 match shows nothing
pattern 2 match shows only Dad (stopped at ' in Dad's)
Versions:
WinNt 4.0 sp 5
PHP/4.0.4pl1
Situation:
I am writing some code that will examine some .htm files for IMG tags and then generate a photo index. The IMG ALT= value will be displayed and will link to the html containing the image reference.
Code:
There is no handy HtmlParse library like in perl (Gisle Aas, and he has no plans to work on one in php), so I had to do some quick'n'dirty preg_ coding.
//---------------------------------------
// look for .htm files
//---------------------------------------
$html_files = array();
$handle = opendir ('.');
while ( $file = readdir ($handle) ) {
if ( eregi ( ".htm$", $file )) {
array_push ($html_files, $file);
}
}
//---------------------------------------
// generate the photo links sub array
//---------------------------------------
$photos = array ();
foreach ($html_files as $file) {
// extract the each image tag
$rawHtml = implode("", file ($file));
preg_match_all ("/<IMG(.*?)>/is", $rawHtml, $IMG);
foreach ($IMG[0] as $img) {
// extract img tag src value
// the backreference \2 does not work!
preg_match ("/SRC\s*=\s*(['\"])(.*?)\2\s*/is", $img, $src);
$src = $src[1];
if (preg_match ( ".^photos/.i", $src )) {
// extract img tag alt value
preg_match ("/ALT\s*=\s*(['\"])(.*?)\2\s*/is", $img, $alt);