I found some information regarding screen scraping and have this code working.
<?
$url = "http://link/number.HTM";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B","<BR>");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'<table width="800" border="0" cellspacing="0" cellpadding="0">');
$end = strpos($content,'</table>',$start) + 8;
$table = substr($content,$start,$end-$start);
preg_match_all("|<tr(.*)</tr>|U",$table,$rows);
foreach ($rows[0] as $row){
if ((strpos($row,'<th')===false)){
preg_match_all("|<td(.*)</td>|U",$row,$cells);
$data = strip_tags($cells[0][0]);
echo $data."<br>\n";
}
}
My problem is that the information is not seperated into colums in this table. Instead it looks like this:
<table width="800" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="630" valign="top">
<BODY><A NAME="Top"></A>
<H1>{COMPANY NAME}<BR> {COMPANY DETAIL}</H1><HR>
<H3><A NAME="0000">Main Company - Name & Address</A></H3>
<P><STRONG><A HREF="LINK">{COMPANY NAME}</A></STRONG><STRONG><A HREF="LINK"></A></STRONG><BR>{STREET}<BR>{CITY STATE, ZIP}<BR>Phone:{PHONE}<BR>Fax:{FAX}<BR>Toll Free:{PHONE 2}<BR>E-Mail: <A HREF="mailto:EMAIL">{EMAIL}</A></P>
<H3>Product Categories</H3>{CAT1}<BR>{CAT2}<BR>{CAT3}<BR>{CAT4}<HR>
<H3><A NAME="Menu">Menu Options:</A></H3>
<P><A HREF="LINK#Top">{MENU LINK}</A></P>
<P><A HREF="LINK#Top"> {MENU LINK}</A></P>
<P><A HREF="LINK#Top"> {MENU LINK}</A></P>
<P><A HREF="LINK">{HOME}</A></P><HR></td>
</table>
What I need to do is format the information but I dont know how. I would like it like this:
{COMPANY NAME},{STREET},{CITY STATE, ZIP},{PHONE},{FAX},{PHONE 2},{EMAIL},{CAT1 - CAT??}
Also in the first code section there is this:
$url = "http://www.link.com/number.HTM";
I know the range of number for the "number.HTM" section of the link. Is it possible to have the code loop down that known list of numbers or between a number range and output in the formatted way I want all on the same page?
Basically what I am attempting to do is create a CSV output so I can move these records to a database for my company. Otherwise I have to hand type somewhere near 1000 records and then choke the living &$# out of the moron that never put them in a database to begin with. These are years of records on our intranet site that need to be moved over.
Please help so I dont go to jail... LOL