[RESOLVED] regex help needed, my code is ugly (works just want it cleaner)

kender · Jul 2, 2011

Okay, I have a serialized array which gives me data like this

";i:938;s:35:"	<li data-id="widgets" class="">
";i:939;s:38:"		<span class="name">Widgets</span>
";i:940;s:34:"		<span class="value">136</span>
";i:941;s:38:"	<span class="clear"><!-- --></span>
";i:942;s:8:"	</li>
";i:943;s:35:"	<li data-id="pogs" class="">
";i:944;s:38:"		<span class="name">Pogs</span>
";i:945;s:34:"		<span class="value">169</span>
";i:946;s:38:"	<span class="clear"><!-- --></span>
";i:947;s:8:"	</li>

I use the following php to get widgets from the above segment from the page (I would use line number, except I cant guarantee it would always be the same number for the field)

$file		= "http://example.com/datasheet";
$contents	= file($file);
$s_contents	= serialize($contents);

$base_widgets_pattern = '(data+.+widgets+.+\n+.+\n+.+\n+.+span)';
preg_match($base_widgets_pattern, $s_contents, $matches);
$base_widgets = serialize($matches);
$base_wid = str_replace("\n", "", $base_widgets);

giving you these lines

";i:938;s:35:"	<li data-id="widgets" class="">
";i:939;s:38:"		<span class="name">Widgets</span>
";i:940;s:34:"		<span class="value">136</span>
";i:941;s:38:"	<span class="clear"><!-- --></span>

Then I use this line of code to get the "value" amount

$widgets_pattern = '(<span class="value">+.+</span>)';
preg_match($widgets_pattern, $base_wid, $wid_matches);
$widgets = strip_tags($wid_matches[0]);

(returns 136 in this case)

NogDog · Jul 2, 2011

Seems to me it would be much cleaner to forgo the serialized array and regexp stuff, and just use the [man]DOM[/man] extension to grab what you want.

$text = file_get_contents($file);
$dom = new DOMDocument();
$dom->loadHTML($text);
$items = $dom->getElementsByTagName('li');
foreach($items as $item) {
   if ($item->hasAttribute('data-id')) {
      if ($item->getAttribute('data-id') == 'widgets') {
         $spans = $item->getElementsByTagName('span');
         foreach($spans as $span) {
            if ($span->hasAttribute('class') and $span->getAttribute('class') == 'value') {
               $value = $span->textContent;
               break;
            }
         }
      }
      break;
   }
}
if(!empty($value)) {
   echo $value;
}
else {
   echo "Not Found!";
}

kender · Jul 2, 2011

if I print_r $items I get

DOMNodeList Object
(
)

if I print_r $dom I get

DOMDocument Object
(
)

if i print_r $text I do get the full page contents though as html

kender · Jul 2, 2011

echo $dom->saveHTML();

returns the html

but print_r of the $items returns an empty array

kender · Jul 2, 2011

$items = $dom->getElementsByTagName('li');
echo $items->length;

returns: 116

kender · Jul 2, 2011

not sure why i am not getting any visible results

NogDog · Jul 2, 2011

kender;10983582 wrote:
if I print_r $items I get
DOMNodeList Object
(
)
if I print_r $dom I get
DOMDocument Object
(
)
if i print_r $text I do get the full page contents though as html

Unfortunately, that's all you ever get for those types of objects: they don't display all their properties the way user-defined objects do in PHP. You have to iterate through the NodeList object to get anything useful out of it.

kender · Jul 2, 2011

in your code, the "Not Found!" resulted not the echo of a $value
trying to walk my way through it, i am not finding results

NogDog · Jul 3, 2011

kender;10983603 wrote:
in your code, the "Not Found!" resulted not the echo of a $value
trying to walk my way through it, i am not finding results

Well, I did actually test it with the HTML fragment you provided (after wrapping it with enough mark-up to make it a valid HTML document).

kender · Jul 3, 2011

not surre what i had wrong, but it is working now, thanks for the help and sorry for saying it didnt waork

[RESOLVED] regex help needed, my code is ugly (works just want it cleaner)

Kkender

NogDog

Kkender

Kkender

Kkender

Kkender

NogDog

Kkender

NogDog

Kkender