This could be done with a single regexp, though as your friend's
example shows, it can get really hairy really quick (though I have to
say that example looks unnecessarily hairy, and I can see cases where it
would fail). To make it simpler, I'm going to do some extra preperation.
This is done for each $line of the file.
Since every $line is the same length, and the start of each field are
aligned in columns, we can find out where a field starts without having
to do a regexp for it. This is a good thing, because if one of the
fields is missing, we'd have to try and figure out which one it was.
This is what makes a one-regexp approach hairy.
list($stockno, $yr, $mk, $carline, $model_description, $color,
$serialno, $list_price, $miles, $inv_amt, $days) = sscanf($line, '%8s %2s %2s %15s %20s %15s %17s %10s %7s %9s %4s');
Okay, but most of these will have extra padding spaces still in them.
Let's trim them all down.
$stockno = trim($stockno);
$yr = trim($yr);
$mk = trim($mk);
$carline = trim($carline);
$model_description = trim($model_description);
$color = trim($color);
$serialno = trim($serialno);
$list_price = trim($list_price);
$miles = trim($miles);
$inv_amt = trim($inv_amt);
$days = trim($days);
Now, some of these will contain silly results, from lines that don't
belong to the ones we want. We know the format of each field, so now we
can use regexps to recognise them.
$stockno_good = preg_match('/\w+$/',$stockno); // The stock number looks good.
$yr_good = ($yr == '' || preg_match('/\d\d$/',$yr)); // The year might not be available, in which case $yr would be empty
$mk_good = ($mk == '' || preg_match('/\w\w$/',$mk)); // The make (optional)
$carline_good = ($carline == '' || preg_match('/( |\w)+$/',$model_description)); // Carline (which may contain spaces)
$model_description_good = ($model_description == '' || preg_match('/( |\w)+$/',$model_description));
$color_good = ($color == '' || preg_match('/( |\w)+$/',$color));
$serialno_good = preg_match('/\w+$/',$serialno); // Serial number (required)
//Incidentally, if serial numbers are always 17 characters long, /\w{17}$/ would do)
$list_price_good = ($list_price == '' || preg_match('/\d+.\d{2}$/',$list_price)); // Which is some digits, a decimal point, and two more digits
$miles_good = ($miles == '' || preg_match('/\d+$/',$miles)); //Odometer reading
$inv_amt_good = ($inv_amt == '' || preg_match('/\d+.\d{2}$/',$inv_amt));
$days_good = ($days == '' || preg_match('/\d+$/',$days));
Now we have a valid line if all the fields are good:
if($stockno_good && $yr_good && $mk_good && $carline_good &&
$model_description_good && $color_good && $serialno_good &&
$list_price_good && $miles_good && $inv_amt_good && $days_good)
{
Do whatever is required; the information from the line is available in
$stockno, $yr, $mk, $carline, $model_description, $color, $serialno,
$list_price, $miles, $inv_amt, and $days.
}