I am attempting to parse the contents of a file to place them into the database. For some reason, however, it is returning each word twice, any ideas? Try it, separating words by spaces, each returns it twice. This script is actually a version of Dan Solin's search engine script, which places words from a file in a DB. But it returns each word twice, haven't got a clue what's wrong (not too good with multi-dimensional arrays...)
Many thanks,
ucbones
<?
/* Define the URL that should be processed: */
$url = addslashes( $_GET['url'] );
if( !$url ){
die( "You need to define a URL to process." );
}
else if( substr($url,0,7) != "http://" ){
$url = "http://$url";
}
/* Start parsing through the text */
if( !($fd = fopen($url,"r")) )
die( "Could not open URL!" );
$startcontent=false;
while( $buf = fgets($fd,1024) ){
/* Remove whitespace from beginning and end of string: */
$buf = trim($buf);
/* Try to remove all HTML-tags: */
$buf = strip_tags($buf);
$buf = ereg_replace('/&\w;/', '', $buf);
$buf = ereg_replace('nbsp', '', $buf);
/* Extract all words matching the regexp from the current line: */
preg_match_all("/(\b[\w+]+\b)/",$buf,$words);
/* Loop through all words/occurrences and insert them into the database: */
for( $i = 0; $words[$i]; $i++ ) {
for( $j = 0; $words[$i][$j]; $j++ ) {
$cur_word = addslashes( strtolower($words[$i][$j]) );
print "Indexing: $cur_word<br>";
}
}
}
fclose($fd);
?>