Someone take a stab at this and potentially help me clean it up. It is used to parse out the WPFileName and WPFileType (along with other things, as you'll see) from WP51 files. As sloppy as it is, it works.. MOST of the time... I'm having problems with a portion of it, however.. I'll explain.
// Constants
$wpdatetime = '[0-9]{2}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}[ap]';
// File
$filedata = file_get_contents($file);
//WP Data Extractor
$pattern = '/('.$wpdatetime.')\x20{10}\x00(.{68})(.{20})\x00{7}\xFF(.*\xA9\x20([0-9]{0,5}))?/i';
if(!preg_match($pattern, $filedata, $matchdata[]))
{
echo "$file - Missed it!<br>";
}
Explaining it (and my likely mixed up logic at 3:47 am!).. WordPerfect 5.1 saves files with the following predictible patterns:
10 spaces (\x20{10}) -->
1 NULL (\x00) -->
88 Characters I want in two variables (this works!) -->
7 NULLS (\x00{7}) -->
1 hFF (\xFF) -->
-- here is where the trouble starts --
Any number of unknown characters... 90-120 or so ... followed by some data I want with the pattern...
\xA9\x20([0-9]{0,5})
How in the h*** do I "skip" over the junk in some way I am not thinking of... I have ONE file (yes one.. buth there is something wrong in the regex) that my regex does not work on, despite it seemingly (here's MY error) matching the pattern.
Ideas??? Clean-ups?? 😛 😕
(.*\xA9\x20([0-9]{0,5}))?
That extra set of ()? enclosing this is JUST to get the WHOLE to work if it somehow MISSES the particular data on the given file I'm having problems with!