I'm trying to implement a class that parses out link from an HTML string. However, i'm having trouble with the referencing inside the callbacks for the element handlers. When its done parsing, the array i have to store the links goes back to null? Why is that?
Here is my code:
class ParseHTML {
var $parser;
var $links;
function ParseHTML() {
$this->parser = &new XML_HTMLSax();
$this->parser->set_object($this);
$this->parser->set_element_handler('open', 'close');
$this->parser->set_data_handler('data');
$this->links = array();
}
function open($parser, $tag, $attr) {
if($tag == "a") {
if(count($attr) > 0) {
$hashed_link = md5($attr['href']);
$this->links[$hashed_link] = $attr['href'];
}
}
}
function close($parser, $tag) {
}
function parse($html) {
$this->parser->parse($html);
return $this->links;
}
}
Can someone please tell me how to do this? I really need to return those links when the parsing is done. Otherwise the parsing is useless. The referencing is getting to be confusing and i cant tell if the handlers are even accessing the right links array. How can i make sure its the array from the actual calling ParseHTML instance and not an anonymous copy of the instance.
Thanks in advance.