Hello all.
I had the idea of creting my own PHP search engine, which is trying to deal with data outputed by google.
I found out the the only way to grep the data I want to use is by using regex ... oohhhh god =)
I need serious help with this. Below is a simple, one result search from google.
<p><a href=http://www.php.net/><b>PHP</b>: Hypertext Preprocessor</a><br><font size=-1> <b>...</b> Contact. Please submit website bugs in the bug system. You can contact the webmaster<br> at webmaster@<b>php</b>.net. <b>PHP</b> Magazin out now! <b>...</b> New <b>PHP</b>.net URL Howto. <b>...</b>
What I wish to get from google is the URL, the Title + the Description. So it can be breaked down pretty much to this ;
<p><a href=$URL>$TITLE</a><br><font size=-1> $DESCRITPION
Easy ? Well, not really ... Sometimes, the HTML is formatted differently, because Google may add a 'translate' function, like this ;
<p><a href=URL>$TITLE <font size=-1>- [ <a href=$TRANSLATE_URL... class=fl>Translate this page</a> ]</font><br><font size=-1> $DESCRIPTION
Also, in some case, a given search result does not output neither a title, nor a description, so we get something like this ;
<p><a href=$URL>$TITLE</a><br><font size=-1>...
I think those a the major cases of exceptions, but there a surelly more of them. I browsed a little over the many regex tutorials on the net, and this did not help much. It seams like im trying to do something that is way over my skills right now, and that is why im calling from help on this board =)
Thanks for any clue or advice.