Removes duplicate values from array

alirezaok

u know array_unique() Removes duplicate values from an array if Two elements are considered equal.

but i want to remove duplicate values if 1/3 of lenght of values are equal.

in other mean if there is duplicate of substr($a[$i], 0, strlen($a[$i])/3)) then remove

any idea?

bradgrafelman

Well... I don't know how efficient this is... but I would make a second array with values 1/3 their original size, and then loop through the array, and use [man]in_array[/man] in an IF statement on each value of the original array to see if 1/3 of the value is in the newly created array. Something like this...

function array_unique_length($array, $length=FALSE) {
	if(!$length)
		$length = 1/3;

$tempArray = array();	
foreach($array as $key => $value) {	
	$short = @substr($value, 0, (strlen($value) * $length));

	if(!in_array($short, $tempArray))
		$tempArray[$key] = $short;
	else
		unset($array[$key]);	
}	
return $array;
}

$myArray = array(
	'this is a test',
	'blah blah...',
	'foobar...',
	'this is funny'
);

print_r( array_unique_length($myArray, 1/3) );

The output of print_r/b would be this:

Array
(
    [0] => this is a test
    [1] => blah blah...
    [2] => foobar...
)

EDIT: Forgot to explain as well - array_unique_lenght() requires at least one parameter. The first parameter would be the array you wish to prune duplicate entries from. The second parameter is the size of the strings you wish to compare. If you leave it blank, I gave it a default value of 1/3 as you suggested.

laserlight

hmm... basically two strings are equal if the first 1/3 of one string is equal to the first portion of the other string?

This is what I suggest:
1. Sort of the array of strings.
2. Place the first string in the array of unique strings.
3. Consider 1/3 of the first string. Loop over the rest of the array, attempting to match as many characters as there are in this string under consideration.
4. If a match fails, add the string that caused the failure to the array of uniques. Place 1/3 of this string as the string under consideration, then continue the looping.

$array = array(/* ... */);
sort($array);
$uniques = array($array[0]);
$len = (int)(strlen($array[0]) / 3);
$str = substr($array[0], 0, $len);
foreach ($array as $value) {
	if ($str != substr($value, 0, $len)) {
		$uniques[] = $value;
		$len = (int)(strlen($value) / 3);
		$str = substr($value, 0, $len);
	}
}
$array = $uniques;

This algorithm only makes a single pass through the array, so informally I would say its complexity should be equivalent to the sort. This is likely to be O(n log n), which is better than the O(n**2) complexity of bradgrafelman's algorithm.

EDIT:
My implementation has a bug for strings of length less than 3. However, the original formulation of the problem makes no provision for strings where 1/3 of the string length can be truncated to 0.