Hi All ! Current Situation :

I'm trying to parse a DomDocument with XPath, the result should be an array with Categories and Subcategories .
The problem is, the person that made the HTML did not structure the info with the subcategories in the main categories, they are just delimited by pure css .

THe html loos like this :

      
<div class="menu_item">Main Category AC</div> <div class="submenu_div"> <a href="http://www.link.com/313"> <div class="sub_item"> <h3>Sub Categ A</h3> </div> </a> <a href="http://www.link.com/475"> <div class="sub_item"> <h3>Sub Categ B</h3> </div> </a> <a href="http://www.link.com/321"> <div class="sub_item"> <h3>Sub Categ C</h3> </div> </a> </div> <div class="menu_item">Main Category BC</div> <div class="submenu_div"> <a href="http://www.link.com/313"> <div class="sub_item"> <h3>Sub Categ X</h3> </div> </a> <a href="http://www.link.com/475"> <div class="sub_item"> <h3>Sub Categ Y</h3> </div> </a> <a href="http://www.link.com/321"> <div class="sub_item"> <h3>Sub Categ Z</h3> </div> </a> </div>

Now, with this php I can extract de categories and subcategories, but it's just a list, I don't know what subcategory is in what category, and I'm stuck .
How can I use Xpath to do extract the main category subcategories and assign a parent to every subcategory ?

        
$doc = new DomDocument; @$doc->loadHTML($html);
$xpath = new DOMXPath($doc); foreach( $xpath->query('//div[@class="menu_item"]|//div[@class="submenu_div"]/a/div/h3') as $e ) { echo $e->nodeValue, "<br />\n"; }

    One way of doing it, using xpath queries for both menu and submenu divs is using two separate xpath queries&#8230;

    $queryMain = '//div[@class="menu_item"]';
    $querySub = '//div[@class="submenu_div"]';
    

    Those should be of equal length, which lets you consume both lists at the same time.

    But doing this is inefficient, because each of those xpath queries must traverse the entire tree. Instead, you could simply query for the submenu's h3 elements, which is a single tree traversal. From each such node, you already know how to find the main menu item:
    1. Go 3 parents up the tree
    2. Go through previousSiblings until you find one of element type with class="menu_item"

        $node = $nodeList->item($i)->parentNode->parentNode->parentNode->previousSibling;
        while ($node->nodeType != XML_ELEMENT_NODE) {
            if ($node == null) {
                throw new Exception('No submenu found for menu item ' . $mainText);
            }
            $node = $node->previousSibling;
        }
    

    Note: no check for proper css class is done. This code assumes the first previous sibling found of element node type is the corresponding main menu div.

      thanks for the reply ! i will try to take parts of the code above and add it to the solution I ended up with, and that is this :

         for ($i = 0; $i <= 25; $i++) {
          foreach( $xpath->query('//div[@class="menu_item"]['.$i.']/text()') as $category ) { 
            echo $i . " Category: " . $category->nodeValue . "<br/>\n";      
      foreach ( $xpath->query('//div[@class="menu_item"][' . $i . ']/following-sibling::div[1][@class="submenu_div"]/a') as $subcategory) { echo '-----'. $i . " Subcategory: " . $subcategory->nodeValue . "<br/>\n"; echo '-----'. $i . " Link: " . $subcategory->getAttribute("href") . "<br/>\n"; } echo "<br/>"; } }

      anyway, thanks a lot for the effort !

        I still recommend my own solution. Parsing the html code in your html example document above (2 menu items, 6 submenu items) your code would traverse the entire dom tree more than 30 times, while my code traverses it exactly once.

          i agree, but it gets the job done .. that's one part of the problem, the second part is I don't really know how to implement your code in my situation ... still learning php, but as I said before I will try try try and try some more until I get it right .

          thanks again 🙂

            Write a Reply...