Hello, got another parse question, this time its to do with html. Anyway theres a site with a list of ips/DNSes that im trying to get. Now iv seen some examples on how to do this using preg_match_all and preg_match but im not sure how I would do this with the current html entrys that im trying to get. Well anyway heres the html code for 2 items in the list:

<tr id="row1"><td><a href="a22456.html">DNS.com</a></td><td>omg 2.1</td><td>100 (187) / 200</td><td>91.41%</td><td>182</td><td><a href="http://link.list.com"><img src="images/a/links.png" alt="links"/></td></tr><tr id="row1"><td><a href="c28164.html">81.23.12.43</a></td><td>server 1.1</td><td>94 (100) / 100</td><td>23.21%</td><td>83</td><td><a href="http://link.secondlist.com"><img src="images/a/secondlist.png" alt="link"/></td></tr>

As you can see there's a pattern. There's alwase a link before the IP/DNS going to a page that gives more info about that Ip/DNS (IE.

<a href="a22456.html">DNS.com</a>

. So im not sure how I would go about getting just the IP/DNSes from the page list. Theres about 40 or so on each page and theres about 7 or 8 pages. I was thinking once someone shows me how to get just the IP/DNS I could just add in the links to each page and run the parser when I need it.

Anyway, so all im asking for is how to parse a remote site and grab the IP/DNS from the above html code. Much thanks.

    Try this one:

    preg_match_all('|<a href="[a-zA-Z0-9]+.html">(.*)</[^>]+>|U',$str,$tmp_arr);
      bogu wrote:

      Try this one:

      preg_match_all('|<a href="[a-zA-Z0-9]+.html">(.*)</[^>]+>|U',$str,$tmp_arr);

      So ummmm how do I use this on a remote page though?

        $str = '<tr id="row1"><td><a href="a22456.html">DNS.com</a></td><td>omg 2.1</td><td>100 (187) / 200</td><td>91.41%</td><td>182</td><td><a href="http://link.list.com"><img src="images/a/links.png" alt="links"/></td></tr><tr id="row1"><td><a href="c28164.html">81.23.12.43</a></td><td>server 1.1</td><td>94 (100) / 100</td><td>23.21%</td><td>83</td><td><a href="http://link.secondlist.com"><img src="images/a/secondlist.png" alt="link"/></td></tr>';
        preg_match_all('|<a href="[a-zA-Z0-9]+.html">(.*)</[^>]+>|U',$str,$tmp_arr);
        if (!empty($tmp_arr)) {
        	echo "<pre>",print_r($tmp_arr,1),"</pre>";
        }

        but if u are not used to work with regular expresion why dont u use string functions?

          ya but a remote page....How am I to go to the web page, save the source and then parse it? Do I have to do somthing like

           
          $url = "http://test.com";
          $text=@file_get_contents($url);
          $str = $text;
          preg_match_all('|<a href="[a-zA-Z0-9]+.html">(.*)</[^>]+>|U',$str,$tmp_arr);
          if (!empty($tmp_arr)) {
              echo "<pre>",print_r($tmp_arr,1),"</pre>";
          } 
          

          Edited
          Well to put it simply I dont know the best method to parse out the IP/DNSes from the lists. thats why I made the post.

            Write a Reply...