Hi,

I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page.

here is the html source I want to scrape:


<span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span>
<a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
</li>
<li class="zc-ssl-pg" id="row1-4" style="">

<span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span>
<a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
</li>
<li class="zc-ssl-pg" id="row1-5" style="">

<span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span>
<a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>

here is the php source:


<?php

$contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches);
$rowtitle = $matches[1];
echo $rowtitle."<br>\n";
?>

And here is the php output:

<br>

does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page?

any advice would be much appreicated.

Thanks in advance

    Does this comply with the site's terms of service?

      TOS notwithstanding, since when is $contents == $data without direct assignment first? :eek:

        From the Terms of Service (emphasis added):

        Terms of Service (September, 2007) wrote:

        You may not republish any portion of the Content on any Internet, Intranet or extranet site or incorporate the Content in any database, compilation, archive or cache. You may not distribute any Content to others, whether or not for payment or other consideration, and you may not archive, modify, copy, frame, cache, reproduce, sell, publish, transmit, display or otherwise use any portion of the Content. You may not scrape or otherwise copy our Content without permission.

        Even if you had gotten prior permission to use the content, they seem like the type that would have a special API for you to use. If they didn't, performing regular expression pattern matching on a page is about the worst method to scrape information from remote sites anyhow.

        Thread closed.

          Write a Reply...