Parsing a web page results..........

trparky · Mar 14, 2003

I need a way to load an HTML file from a server, read the file, then parse it so that that extracts a number from inside two HTML tags.

This is the setup.....

I need to get this <a HREF="/tracker/161242">XXXXXXX</a>
X being a number.

The parsing must be able to find numbers in between those two tags no matter how big the number is.

Can anyone help me? I have never created a parsing engine.

thna_brianb · Mar 14, 2003

Do you need the number after the link is clicked, or as the page with the link on it loads?

diego25 · Mar 14, 2003

You can use preg_match to find that. Read more about regular expressions here

Diego

thna_brianb · Mar 14, 2003

$test1 = preg_replace("'<[\/!]?[^<>]?>'si", "",$test1);

$test1 should then be the line with all html tags stripped out.

statrat · Mar 14, 2003

First, look into Snoopy for fetching remote web pages, if that's what you're after.
http://snoopy.sourceforge.com

or try reading the manual on file functions.

now for probably the simplest, worst links regular expression/parser ever:

 
<?php

$html='<HTML><HEAD><TITLE> </TITLE></HEAD>';
$html.='<BODY><A href="/tracker/161242">3873</a>';
$html.='<a href="/tracker/161242">XXXX</A></BODY></HTML>';

get_inner($html);

function get_inner($input){

preg_match_all("/(<(a|A) |HREF=.*>)(.*)(<\/(A|a)>)/", $input, $links);

foreach($links[0] as $link){
$inner=strip_tags($link);
echo $inner;
}

}

?>

Parsing a web page results..........

Ttrparky

Tthna_brianb

Ddiego25

Tthna_brianb

Sstatrat