I'm processing files that have some user names associated with numbers buried somewhere in their body.
Sample File:
McEntee, head of the American Federation of State, County and Municipal Employees, delivered the
news in a meeting in Burlington, Vt.
Dean did stand to gain some of Washington’s 76 delegates
1 1 stripme<^vs[5
2 1 <wtf<getit
3 1 SER<RES
supporters he must win to remain in the race.
Edwards, should expect to finish second at best to Kerry in Tuesday’s primaries in Tennessee or
The lines that Im interested in extracting are these:
1 1 stripme<^vs[5
2 1 <wtf<getit
3 1 SER<RES
I had my lil program working just fine. Until yesterday, when I noticed that for lines like 2 and 3, where there is a "<" followed by an alpha character it does not get the entire line.
Here is my sample script file:
<?php if ($HTTP_POST_VARS['action']) { ?>
<HTML>
<HEAD>
<TITLE>File Upload Results</TITLE>
</HEAD>
<BODY BGCOLOR="WHITE" TEXT="BLACK">
<P><FONT FACE="Arial, Helvetica, sans-serif"><FONT SIZE="+1">File Upload
Results</FONT><BR><BR>
<?php
$uploadpath = '/path/to/store/uploaded/files/';
$source = $HTTP_POST_FILES['file1']['tmp_name'];
$dest = '';
if ( ($source != 'none') && ($source != '' )) {
$fp = fopen( $source , "r" );
$contents = fread($fp, filesize($source));
fclose($fp);
preg_match_all( "/[0-9]\s+[0-9]\s.*/",$contents, $matches);
for ($i=0; $i< count($matches[0]); $i++)
{
echo "<br>";
echo "matched: ".$matches[0][$i]."<br>";
}
} else {
echo 'File not supplied, or file too big.<BR>';
}
?>
<BR><A HREF="<?php echo $PHP_SELF ?>">Back</A>
</FONT></P>
</BODY>
</HTML>
<?php } else { ?>
<FORM METHOD="POST" ENCTYPE="multipart/form-data"
ACTION="<?php echo $PHP_SELF;?>">
<INPUT TYPE="HIDDEN" NAME="MAX_FILE_SIZE" VALUE="800000">
<INPUT TYPE="HIDDEN" NAME="action" VALUE="1">
File 1: <INPUT TYPE="FILE" NAME="file1" SIZE="30"><BR><BR>
<INPUT TYPE="SUBMIT" VALUE="Upload">
</FORM>
<?php } ?>
This is the crucial line of code:
preg_match_all( "/[0-9]\s+[0-9]\s.*/",$contents, $matches);
When I run the script the ouput looks like this:
matched: 1 1 stripme<^vs[5
matched: 2 1
matched: 3 1 SER
As I understand it, I'm telling it to get every character until the line feed with this ".*" Like I said, it works fine except for that sequence "<" plus some alpha character.
Any help would be greatly appreciated. I'm starting to think this is a failure in PHP's imprementation of regular expressions, but I'm hoping is just some trivial thing that I'm missing.
Thanks in advanced!
-Eros