Extracting text as a variable

unanimous_cowro

Hi all.

I'd like to display a "counter" on a page of mine, www.foo.com/example.php, like "Current number of items: X".

I want to grab the integer number X from a known text string on a page on another site, www.bar.com/blah.html, where there's a bit saying "X Total Items."

If it helps, the HTML code and text before and after X doesn't occur anywhere else in the page I want to grab it from. ("<font size=+2>X Total Items").

Could someone please post an example code snippet for a total n00b.
Many thanks in advance!

(One of these days I'll learn PHP "for real", but I need this urgently... 😉 )

meetaby

string stristr ( string haystack, string needle)

Returns all of haystack from the first occurrence of needle to the end. needle and haystack are examined in a case-insensitive manner.

If needle is not found, returns FALSE.

If needle is not a string, it is converted to an integer and applied as the ordinal value of a character.

Kerry_Kobashi

I wont write the entire code but will give you some ideas...

// fopen on a URL to the webpage you desire.
$fp = fopen("http://www.foobar.com/thepageIwanttoScarf.html", "r");

// search the file reading one line of text in at a time
// assumes no line is over 1K in size
while (!feof($fp) && $bFound == false)
{
$currLine = fgets($fp, 1024);

// what follows is specific to the token being searched
if (preg_match(blah, blah) == true)
    {
     $bFound = true;
     preg_replace("blah", "blah")
    }

}

//Finally close the file
fclose($fp);

Kerry_Kobashi

See if you can expand on this for starters... 😉

<?php

function PageSnag($url)
{
$fp = fopen($url, "r");
$bFound = false;
while (!feof($fp) && $bFound == false)
{
$line = fgets($fp, 1024);
echo $line;
}
}

PageSnag("http://www.phpbuilder.com");
?>

unanimous_cowro

Thanks, meetaby & Kerry!

However, I'm totally new to PHP and I don't know how I should apply your examples. Looking at them, I'm not even sure I've explained what I want... 😉

As a more specific example, look at the standard Google front page. At the bottom it right now says "Searching 3,083,324,652 web pages" (Let's pretend it actually says 3083324652 without commas, for simplicity's sake...)

I want to grab this integer number, X, whatever it might be at any given moment, and have it displayed in the text of a HTML document of my own. (My unqualified newbie guess says that it should be put in a variable and echoed by PHP.)
"Right now Google has indexed X pages."

What is known in this case:

The URL: http://www.google.com/index.html
The integer number I'm looking for equals the string found between the unique phrases/words "Searching " and " web pages".
Furthermore, this number always appears on line 15 of the google.com/index.html source, if that helps.

If a kind soul could post a code snippet for this specific Google example, then I'm sure that this lazy and stressed newbie could adapt it for all of his own future needs! 🙂

Kerry_Kobashi

No, you don't have to explain... I understand what you are trying to do.

Be aware that some websites will prevent you from scraping information off their pages and do so by placing random text after the main token you are looking for.

Also, you need to make sure you have authorization to scrape this information off their website. Read their terms of service. Don't ever think that you won't get caught scraping information without approval. They will log your access and the information is copyrighted.

Again, I won't write out the entire code for you. You will need to do this yourself. The code in the previous two messages is as far as I will go. Your job is do some research on how to accomplish string matching.

Good luck.

reidme

I have just had to do this exact same thing and as a newbie it took me along time to get my head around the string functions:
strpos, strstr, str_replace, substr_replace. These are the functions you need. Using fopen grab the whole page into a variable, use str_pos to locate the position in the string of the 'X', then use substr to return the value of 'X' into a $variable.
Here is the google example:

// get contents of a file into a string
$filename = "http://www.google.com/index.html";
$filesize = 10000; //work around problem with fopen and filesize.
$handle = fopen ($filename, "r");
$content = fread ($handle, $filesize);
fclose ($handle);

$search_start_string = "<font size=-2> - Searching ";
$search_end_string = " web pages</font>";
$searching_position = strpos($content, $search_start_string);
$X_start_position = $searching_position + 27; //27 is the amount of chars in $search_start_string.
$X_end_position = strpos($content, $search_end_string);
$X_length = $X_end_position - $X_start_position;

$X = substr($content, $X_start_position, $X_length);
echo $X; // prints '3,083,324,652' (well for now anyway)

Hope this helps. If you are still stuck, post again.

unanimous_cowro

reidme, that worked brilliantly. Thanks!

Kerry, I share your stance against "hand-holding" and just dumping example code at someone who's trying to learn a language. But in this case I'm not trying to learn PHP. Not just yet. 😉 I just wanted some consulting for free for a specific problem... (Not that I'm running a business or that this is to be used for anything remotely commercial!)

Anyway, now I know that this forum seems to be the place to go when I start fiddling with PHP "for real". Thanks guys! 🙂

reidme

You are welcome. I am trying to learn the language and the newbie posts on this forum are good tasks for me to get my teeth into.

Kerry_Kobashi

Nice job on the scrape code ReidMe

I've rewritten the code so that it can be reusable.

<?php

// Load the entire contents of a url
function LoadUrlContent($url, &$content)
{
$content = "";
$fp = fopen($url, "r");
while (!feof($fp))
{
// chunk in 4K a time
$content .= fgets($fp, 4096);
}
fclose($fp);
}

// Scrape a page based on starting and ending pattern
function PageScrape($url, $strStartPattern, $strEndPattern)
{
$strScrape = "";

// read in the entire $url page into memory
LoadUrlContent($url, $content);

// search for starting text and position to beginning of text to be scraped
$pos = strpos($content, $strStartPattern);
if ($pos == false)
    return($strScrape);
$startPos = $pos + strlen($strStartPattern);

// search for ending text
$endPos = strpos($content, $strEndPattern);
if ($endPos == false)
    return($strScrape);

// Scrape the text
if ($endPos >= 0 && $startPos >= 0)
{
    $len = $endPos - $startPos;
    $strScrape = substr($content, $startPos, $len);
}

return($strScrape);

}

Example:

Get Google page information:

echo PageScrape("http://www.google.com", "<font size=-2> - Searching ", " web pages</font>");

Get Yahoo news information:

echo PageScrape("http://www.yahoo.com", "<table width=100% cellpadding=8 cellspacing=0 border=0 bgcolor=f1f1fd class=yhmnwbd>", "<hr noshade size=1 color=d0d3f2>");

P.S.
I hold no responsibility for usage of this code. For educational purposes only. 😃