Can any one help?
What I am trying to do:
I've got a .htm file converted from a word document uploaded to a php page. I am trying to extract all the appropriate sections. The particular I need help with is extracting a list of numbers (and bullet points but thats not part of the problem).
Example result output snippet from the code:
3. Personal Details
4. Recruitment, Transfers and Promotions
5. Placement Students
Chapter 3: Compensation and Benefits
1. Catering
2. Compensation
3. BCA Car Ownership Scheme
This is the line that I am having problems with:
if (strstr($matches[$i],'·') XOR (preg_match("#\d+.{1,1}#",$matches[$i]))) {
It is supposed to only select lines where there is a bullet point dot or a list of numbers followed by a single dot. However as you can see from the output snippet, it has also included 'Chapter 3: ...' which is very wrong. Any clues?
Any help would be most appreciated.
Qubit
This is the statement above in context:
<pre>
// check for bullet points in the document (dots or numbers)
if (strstr($matches[$i],'·') XOR (preg_match("#\d+.{1,1}#",$matches[$i]))) {
while ($i < count($matches) && (strstr($matches[$i],'·') XOR preg_match('#[0-9]{1,2}.?#i',$matches[$i])) && !empty($matches[$i])) {
//if (preg_match('#[0-9]{1,2}.?#i',$matches[$i])) echo 'is number';
$tmpStorage .= $matches[$i].'<br>';
$i++;
// this is a patch for a problem that I couldn't
// sort out properly
$matches[$i] = scrubClean($matches[$i]);
while (empty($matches[$i])) {
// echo $i.' is empty';
$i++;
$matches[$i] = scrubClean($matches[$i]);
}
$matches[$i] = scrubClean($matches[$i]);
// echo 'Matches at '.$i.' = '.$matches[$i].'<br>';
}
$dbContents[$dbCount] = $tmpStorage;
// echo $i;
echo ' Bullet<br>'.$dbContents[$dbCount]."\n\n";
$tmpStorage = "";
$dbCount++;
// check for pics that are supposed to be inserted
}
</pre>