I have written a script to compare files.
There are 1000 files so to compare with each other
is approx 1 million combinations.
However it is only comparing two at at time and should
be overwriting the variable that holds the file each iteration
of the for loop ... so I don't see the reason for using up the memory !
This is the script:
$path = "/home/compare/projects/$Db_directory/";
$art_path = $path.'spin/';
$out_file = $path.'page_check.txt';
$out_file1 = $path.'page_check80.txt';
$out_file2 = $path.'page_check70.txt';
$out_file3 = $path.'page_check60.txt';
$fp = fopen("$out_file", "wb");
$fp1 = fopen("$out_file1", "wb");
$fp2 = fopen("$out_file2", "wb");
$fp3 = fopen("$out_file3", "wb");
if ($fp === FALSE ) {
echo "Problem opening file: $out_file<br>";
exit;
}
$output = 'Starting Comparison Run - '.$logstamp.'\n';
fwrite($fp, $output, strlen($output));
$art_cnt = 0;
if ($handle = opendir($art_path)) {
while (($file = readdir($handle)) !== false){
if (!in_array($file, array('.', '..')) && !is_dir($art_path.$file))
$art_cnt++;
}
}
$remove_these = array(',', '.', '!', '?', ':', ';');
// 'XZ', '[hd2]', '[hd3]', '[hd4]', '[hd5]', '[hd6]',
// '[b]', '[z]', '[u]', '[li]', '[l]', '[r]', '[str]', '[c]', '[em]', '[/b]', '[/z]', '[/u]', '[/li]', '[/l]', '[/r]', '[/str]', '[/c]', '[/em]',
// '[list=dc w=400]', '[list=dc w=500]', '[/list]');
$art1 = 0;
$art2 = 0;
$data = array();
// Select article to compare with all others
for ($art1 = 0; $art1 <= $art_cnt; $art1++) {
// For files that are named: 1.txt, 2.txt etc.
$filename1 = $art1.$Db_file_end;
$art_path1 = $art_path.$filename1;
// Now select the second article for the comparison
for ($art2 = 0; $art2 <= $art_cnt; $art2++) {
$filename2 = $art2.$Db_file_end;
$art_path2 = $art_path.$filename2;
// Ensure we are comparing different articles
if( $art_path1 != $art_path2) {
if (file_exists($art_path1)) {
// echo "Filename1: $path1<br>";
if (file_exists($art_path2)) {
// echo "Filename2: $path2<br>";
$article1 = file_get_contents($art_path1);
$article1 = strtolower($article1);
$article1 = str_replace($remove_these, '', $article1);
$article1 = str_replace(' ', ' ', $article1);
$words1 = explode(' ', $article1);
LINE 127 $article2 = file_get_contents($art_path2);
$article2 = strtolower($article2);
$article2 = str_replace($remove_these, '', $article2);
echo "<br>First Article: $art_path1 <br>$article1<br><br>";
echo "<br><br>Second Article: $art_path2 <br>$article2<br><br>";
$found_match = 0;
$word_count = count($words1) - $Db_words;
$found_match = 1;
[DO THE COMPARE ]
$output = "$art1 vs $art2 = $compare".'\r\n';
fwrite($fp, $output, strlen($output));
echo "$output<br>";
if($unique < 80){
fwrite($fp1, $output, strlen($output));
}
if($unique < 70){
fwrite($fp2, $output, strlen($output));
}
if($unique < 60){
fwrite($fp3, $output, strlen($output));
}
} // end different articles
} // end for - to select second article
} // end for - to select first article
$logstamp1 = date('H:i:s l, j F Y');
$output = "Finished $logstamp1";
fwrite($fp, $output, strlen($output));
fwrite($fp1, $output, strlen($output));
fwrite($fp2, $output, strlen($output));
fwrite($fp3, $output, strlen($output));
fclose($fp);
fclose($fp1);
fclose($fp2);
fclose($fp3);
The last entry written in the log file ( $fp ) is:
252 vs 236 = 85\r\n252 vs 237 = 85\r\n252 vs 238 = 85
So it does a lot of them ... it is comparing file 252 with 238 when
is runs out of memory.
The failure is at line 127 - indicated
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 8192 bytes) in /home/compare/check.php on line 127,
Can you see why it is using up the memory ?
Thanks
David.