I just "mastered" the following task:
Strip all the sources of image tags in a directory of html files to see which are used and which aren't.
I solved it rather clumsily:
<?
$handle=opendir('.');
$is_files=array();
while (($file = readdir($handle))!==false):
if (substr($file,-4) == ".htm"){
$is_files[] = "".$file;
}
endwhile;
closedir($handle);
$allpics=array();
for ($f=0;$f<=sizeof($is_files)-1;$f++):
$newfile = fopen($is_files[$f],"r");
$str = fread($newfile, filesize($is_files[$f]));
fclose($newfile);
$str = strip_tags($str,"<img>");
$pattern="/[\s,]+/";
$strr = preg_split($pattern,$str);
for ($d=0;$d<=sizeof($strr);$d++):
if (substr($strr[$d],0,5) == "src=\""):
$allpics[]=substr($strr[$d],5,-1);
endif;
endfor;
endfor;
sort($allpics);
$sortpics=array();
$oldpic="";
for ($d=0;$d<=sizeof($allpics);$d++):
if ($allpics[$d] != $oldpic):
$sortpics[]=$allpics[$d];
$oldpic=$allpics[$d];
endif;
endfor;
for ($d=0;$d<=sizeof($sortpics);$d++):
echo $sortpics[$d]."<br>";
endfor;
?>
Now I was wondering if there was a regular expression to do that???
/chris