Regular Expressions - Syntax Problem

Anon

Can any one help?

What I am trying to do:
I've got a .htm file converted from a word document uploaded to a php page. I am trying to extract all the appropriate sections. The particular I need help with is extracting a list of numbers (and bullet points but thats not part of the problem).

Example result output snippet from the code:
3. Personal Details
4. Recruitment, Transfers and Promotions
5. Placement Students
Chapter 3: Compensation and Benefits
1. Catering
2. Compensation
3. BCA Car Ownership Scheme

This is the line that I am having problems with:
if (strstr($matches[$i],'·') XOR (preg_match("#^{\d+.{1,1}#",$matches[$i])))} {

It is supposed to only select lines where there is a bullet point dot or a list of numbers followed by a single dot. However as you can see from the output snippet, it has also included 'Chapter 3: ...' which is very wrong. Any clues?

Any help would be most appreciated.

Qubit

This is the statement above in context:

<pre>
// check for bullet points in the document (dots or numbers)
if (strstr($matches[$i],'·') XOR (preg_match("#^{\d+.{1,1}#",$matches[$i])))} {
while ($i < count($matches) && (strstr($matches[$i],'·') XOR preg_match('#[0-9]{1,2}.?#i',$matches[$i])) && !empty($matches[$i])) {
//if (preg_match('#[0-9]{1,2}.?#i',$matches[$i])) echo 'is number';
$tmpStorage .= $matches[$i].'<br>';
$i++;
// this is a patch for a problem that I couldn't
// sort out properly
$matches[$i] = scrubClean($matches[$i]);
while (empty($matches[$i])) {
// echo $i.' is empty';
$i++;
$matches[$i] = scrubClean($matches[$i]);
}
$matches[$i] = scrubClean($matches[$i]);
// echo 'Matches at '.$i.' = '.$matches[$i].'<br>';
}
$dbContents[$dbCount] = $tmpStorage;
// echo $i;
echo ' Bullet<br>'.$dbContents[$dbCount]."\n\n";
$tmpStorage = "";
$dbCount++;

  // check for pics that are supposed to be inserted
  }

</pre>

Anon

sorry I can't spend more time analyzing this, but if you have the document on the web somewhere and want to see if you can find the string, go to

http://samuelfullman.com/team/php/tools/regular_expression_tester_p.php

and you can test any PREG regular expression against any string of any file on the Web. This lets you ensure that regex is not the problem. This tool has been the 'final answer' for me in testing regex.

Note: you need the full url path like:

http://amazon.com/ not amazon.com