Hey there,
I am trying to write a script that parses/scrapes public data off of another site.
At the moment, the script can accept a url and apply $_GET variables to the end of it to bring up the proper search results. It then searches the source code the information I need (Business, Address, Phone Number) and sticks the data into an array.
The problem I have is this:
The site does not have an address for every business, so for every 40 business results, there may be 40 or less address. Because of this, the array data doesn't match.
Essentially --- Business 1 should go with Address 1, and so on. Thus $business[1] goes with $address[1]. When an address does not exist for a listing, the script simply skips down to the next entry and takes that address.
$business[1] (has address) matches $address[1]
$business[2] (no address) does not match $address[2]
$business[3] matches $address[2]
I need to separate each listing, with all necessary data contained, into separate strings to be scraped.
In the site's source code, each listing is enclosed within <li class="listing"> [listing data here] </li>. I can't further scrape the data between each listing because it gets placed into an array, and my attempt to use the implode function doesn't seem to have worked.
<li class="listing>
<span class="business">Business Name</span>
<span class="address">Business Address</span>
</li>
<li class="listing>
<span class="business">Business Name</span>
</li>
<li class="listing>
<span class="business">Business Name</span>
<span class="address">Business Address</span>
</li>
$what= $_POST['what'];
$where = $_POST['where'];
$url = "http://www.example.com/search/listings?what=" . $what . "&where=" . $where;
$raw = file_get_contents($url);
function parseArray($string,$open,$close)
{
preg_match_all("($open(.+?)$close)", $string, $matchingData);
return $matchingData[0];
}
$rawListingsArray = parseArray($raw,'<li class="listing','</li>');
$rawListings = implode($rawListingsArray);
$businessArray = parseArray($rawListings,'<span id="business','</span>');
print_r($businessArray);
In theory, this script should print the name of every business listed in $url. By separating each listing into it's own section to be parsed, I can program it to detect if there is no address given and do something.
I hope all of this makes sense, it's clearly been a little difficult to explain on here.