Okay, I have a serialized array which gives me data like this

";i:938;s:35:"	<li data-id="widgets" class="">
";i:939;s:38:"		<span class="name">Widgets</span>
";i:940;s:34:"		<span class="value">136</span>
";i:941;s:38:"	<span class="clear"><!-- --></span>
";i:942;s:8:"	</li>
";i:943;s:35:"	<li data-id="pogs" class="">
";i:944;s:38:"		<span class="name">Pogs</span>
";i:945;s:34:"		<span class="value">169</span>
";i:946;s:38:"	<span class="clear"><!-- --></span>
";i:947;s:8:"	</li>

I use the following php to get widgets from the above segment from the page (I would use line number, except I cant guarantee it would always be the same number for the field)

$file		= "http://example.com/datasheet";
$contents	= file($file);
$s_contents	= serialize($contents);

$base_widgets_pattern = '(data+.+widgets+.+\n+.+\n+.+\n+.+span)';
preg_match($base_widgets_pattern, $s_contents, $matches);
$base_widgets = serialize($matches);
$base_wid = str_replace("\n", "", $base_widgets);

giving you these lines

";i:938;s:35:"	<li data-id="widgets" class="">
";i:939;s:38:"		<span class="name">Widgets</span>
";i:940;s:34:"		<span class="value">136</span>
";i:941;s:38:"	<span class="clear"><!-- --></span>

Then I use this line of code to get the "value" amount

$widgets_pattern = '(<span class="value">+.+</span>)';
preg_match($widgets_pattern, $base_wid, $wid_matches);
$widgets = strip_tags($wid_matches[0]);

(returns 136 in this case)

    Seems to me it would be much cleaner to forgo the serialized array and regexp stuff, and just use the [man]DOM[/man] extension to grab what you want.

    $text = file_get_contents($file);
    $dom = new DOMDocument();
    $dom->loadHTML($text);
    $items = $dom->getElementsByTagName('li');
    foreach($items as $item) {
       if ($item->hasAttribute('data-id')) {
          if ($item->getAttribute('data-id') == 'widgets') {
             $spans = $item->getElementsByTagName('span');
             foreach($spans as $span) {
                if ($span->hasAttribute('class') and $span->getAttribute('class') == 'value') {
                   $value = $span->textContent;
                   break;
                }
             }
          }
          break;
       }
    }
    if(!empty($value)) {
       echo $value;
    }
    else {
       echo "Not Found!";
    }
    

      if I print_r $items I get

      DOMNodeList Object
      (
      )

      if I print_r $dom I get

      DOMDocument Object
      (
      )

      if i print_r $text I do get the full page contents though as html

        echo $dom->saveHTML();

        returns the html

        but print_r of the $items returns an empty array

          $items = $dom->getElementsByTagName('li');
          echo $items->length;

          returns: 116

            not sure why i am not getting any visible results

              kender;10983582 wrote:

              if I print_r $items I get

              DOMNodeList Object
              (
              )

              if I print_r $dom I get

              DOMDocument Object
              (
              )

              if i print_r $text I do get the full page contents though as html

              Unfortunately, that's all you ever get for those types of objects: they don't display all their properties the way user-defined objects do in PHP. You have to iterate through the NodeList object to get anything useful out of it.

                in your code, the "Not Found!" resulted not the echo of a $value
                trying to walk my way through it, i am not finding results

                  kender;10983603 wrote:

                  in your code, the "Not Found!" resulted not the echo of a $value
                  trying to walk my way through it, i am not finding results

                  Well, I did actually test it with the HTML fragment you provided (after wrapping it with enough mark-up to make it a valid HTML document). 🙂

                    not surre what i had wrong, but it is working now, thanks for the help and sorry for saying it didnt waork

                      Write a Reply...