[RESOLVED] array_diff to compare multiple (4) arrays

akremedy · Dec 21, 2006

Learned a lot about array functions today, but have run out of research energy to address this last bit of my project...

I've got four arrays that I'm constructing from two distinct tables (table 1 -and- table 2) and two distinct directories (dir 1 -and- dir 2). All four sources will normally contain a file with the same filename.

Dir 1 and dir 2 will never contain duplicate files (obviously). However, table 1 and table 2 may contain duplicate references to a file.

It is possible that one or more of the four sources may become out of sync and no longer contain file (or reference to file) "xyz" - these are considered to be orphaned.

Finally - the question:
Is there an elegant approach to ultimately arriving at an array_diff that can determine if any given file doesn't exist in all four places? The problem that I've got with array_diff now is that the first array provides the basis for comparison, and I can't rely on the first array always containing every file. I guess I could run array_diff 24 times comparing each combination of the four arrays (432*1), or a bunch of array_intersect/array_diff combinations but that seems kind of crazy. There must be a more logical way.

Any suggestions?

NogDog · Dec 21, 2006

Not sure how elegant this is, but.... You could load your data into one 2-dim array where the first index is the file name, and the second is the data source. Then you'd loop through each data source and add an element to the array, ending up with something like:

// file1 in all 4 locations:
$array['file1']['dir1'] = 1;
$array['file1']['dir2'] = 1;
$array['file1']['table1'] = 1;
$array['file1']['table2'] = 1;
// file 2 in only 3 locations:
$array['file2']['dir1'] = 1;
$array['file2']['dir2'] = 1;
$array['file2']['table1'] = 1;

Then you can check for missing files:

foreach($array as $key => $val)
{
   if(count($val) < 4)
   {
      echo "<p>File $key not in all locations:</p>\n<pre>";
      print_r($val);
      echo "</pre>\n";
   }
}

Weedpacket · Dec 21, 2006

I'm going to just call the arrays $array1 ... $array4 at the moment and assume that they all contain comparable filenames (if the file is called "foo.bar" in one array, it will be "foo.bar" in the others). The following will remove from $array1...$array4 any entries that appear in all four arrays, leaving those that aren't.

$combined_arrays = array_intersect($array1, $array2, $array3, $array4);
$array1 = array_diff($array1, $combined_arrays);
$array2 = array_diff($array2, $combined_arrays);
$array3 = array_diff($array3, $combined_arrays);
$array4 = array_diff($array4, $combined_arrays);

akremedy · Dec 23, 2006

Thanks for the ideas guys, I've done some thinking and looking into multi-dimensional arrays and use of hashes to construct the matrix that I'm ultimately looking for. Here's what I've come up with that seems to be working, and is within my grasp:

First: I splice all the arrays together so that I'm guaranteed that all filenames are found, regardless of which data sources (table1, table2, dir1, and/or dir2) they may or may not be found in.

$allfnames = $table1;
array_splice($allfnames,1,0,$table2);
array_splice($allfnames,1,0,$dir1);
array_splice($allfnames,1,0,$dir2);

Next: Then get rid of the duplicates in the spliced array:

$allfnames = array_unique($allfnames);

Finally: Loop through the spliced and parsed array and look for matches against each individual array:

echo"<table><tr><td></td><td>image</td><td>blog</td><td>thumb</td><td>fullsize</td></tr>";
  foreach ($allfnames as $filename) {
    echo "<tr><td>$filename</td>";
    $s1 = array_search($filename,$table1,false);
    if($s1 !== false) {
      echo "<td>X</td>";
    } else {
      echo "<td></td>";
    }
    $s2 = array_search($filename,$table2,false);
    if($s2 !== false) {
      echo "<td>X</td>";
    } else {
      echo "<td></td>";
    }
    $s3 = array_search($filename,$dri1,false);
    if($s3 !== false) {
      echo "<td>X</td>";
    } else {
      echo "<td></td>";
    }
    $s4 = array_search($filename,$dir2,false);
    if($s4 !== false) {
      echo "<td>X</td></tr>";
    } else {
      echo "<td></td></tr>";
    }
}
echo"</table>";

So that it looks like this:

..............table 1....table 2....dir 1....dir 2
foo.bar........X...........X..........X........
bar.foo.....................X..........X........
far.boo........X...........X...................X
boo.far........X................................X
etc...

Thanks again,
ak

Weedpacket · Dec 23, 2006

[man]array_merge[/man]might be simpler than [man]array_splice[/man].

But here's a variation that doesn't require repeated searching. (More exactly, it searches by key instead of value, making it quite a bit faster).

$array1 = array('foo.bar','far.boo','boo.far');
$array2 = array('foo.bar','bar.foo','far.boo');
$array3 = array('foo.bar','bar.foo');
$array4 = array('far.boo','boo.far');

$all_names = array_unique(array_merge($array1,$array2,$array3,$array4));
sort($all_names); // Sorts the file names, making it easier to eyeball

foreach($all_names as $filename)
	$results1[$filename] = false;
$results2 = $results3 = $results4 = $results1;

foreach($array1 as $filename) $results1[$filename]=true;
foreach($array2 as $filename) $results2[$filename]=true;
foreach($array3 as $filename) $results3[$filename]=true;
foreach($array4 as $filename) $results4[$filename]=true;

echo "<tr><th></th><th>table1</th><th>table2</th><th>dir1</th><th>dir2</th></tr>\n";
foreach($all_names as $name)
{
	echo"<tr><td>$name</td><td>", ($results1[$name]?'X':'.'), "</td><td>", ($results2[$name]?'X':'.'), "</td><td>", ($results3[$name]?'X':'.'), "</td><td>",($results4[$name]?'X':'.'), "</td></tr>\n";
}

akremedy · Dec 23, 2006

Very cool! I like your variation a lot and will be going with it instead of mine.

BTW - I had avoided array_merge as (based on what I was reading) it was suggested that I'd run into problems with duplicate keys - unfortunately, it didn't differentiate between string vs numeric keys.

Weedpacket · Dec 23, 2006

The keys won't be duplicated (keys can't be duplicated); that note on the page is referring to the values. And the potential (indeed, the expectation) that there would be multiple instances of the same value is the reason I had the array_unique() call there. I use array_merge all the time, myself.

I fiddled with a bit afterwards, and skipped the need for the $all_names array:

$results1 = array_flip(array_merge($array1,$array2,$array3,$array4));
ksort($results1);
foreach($results1 as $filename=>$whocares)
    $results1[$filename] = false;

$results2 = $results3 = $results4 = $results1;

[RESOLVED] array_diff to compare multiple (4) arrays

Aakremedy

NogDog

Weedpacket

Aakremedy

Weedpacket

Aakremedy

Weedpacket