Hi guys...
I am developing a web crawler that reads the html of severall websites and i want to get specific information from the html, but i am not able to do so... I will post some examples to see if you guys can come with any idea.
This is part of the html from the website, and i highlighted what i need to extract from it.
One other problem is that i need to crawl and extract information from other websites made by this same company, however they dont use exact templates and information location and boundaries vary from site to site. Is there a way to make an inteligent script to read the info?
Thank you for your info.
<div style="padding-left: 5px; color:#af251b">
<strong><img src="i/bullet_v.gif" width="8" height="9" border="0"> Vende-se</strong><br>
<img src="i/bullet_v.gif" width="8" height="9" border="0"> Terreno<br>
<img src="i/bullet_v.gif" width="8" height="9" border="0">
Chaves
<br><img src="i/bullet_c.gif" width="8" height="8" border="0">
<span style=" color:#666666"> [COLOR="red"]70.000,00 €[/COLOR]</span>
<table border="0" cellspacing="0" cellpadding="0" style="padding-top:2px ">
<tr>
<td><img src="i/left.gif" width="4" height="12"></td>
<td bgcolor="#666666" style="padding-left:3px; padding-right:3px;"><a href="?pagina=imovel.asp&lingua=0&tipo=1&query=&ordena=&tipoOrdem=&opcao=2048&posicao=1&idImovel=2181&idLoja=Sede" class="txt6">mais info</a></td>
<td><img src="i/right.gif" width="4" height="12">
</td>
</tr>
</table>
</div>
</td> <td valign=top>
<div style="padding-left: 0px; ">
<table border="0" cellspacing="0" cellpadding="0" width="130">
<tr>
<td width="100%"><div style="border:0px solid #c0c0db; width:130; height:100; left:0px; top:0px; overflow:hidden;">
<a target='_parent' title='Ref. nº 3430' href='?pagina=imovel.asp&lingua=0&tipo=1&query=&ordena=&tipoOrdem=&opcao=2048&posicao=2&idImovel=1945&idLoja=Sede'><img src='/imo3510100953/i/Sede/i/1945/1.jpg_s.jpg' width=140 border=0></a>