What i meant is:
Is the best aprouch to get record 1, compare it with all other records, find all matches, determine the percentage of a match and than goto record 2, compare it with all other recorcds etc etc etc.
Or some kine of funtion that takes random recordid's and compare each found random record with all other records.
Let say: that the we have 1000 records, isnt it a better approuch to first get record 500, do it's thing, than record 1, than record 1000:
500 compare......
1 compare
1000 compare
499 compare
2 compare
999 compare
I hope you understand what i meant.,
I think the stratagy for comparing is this:
- Find out how many records we have
1: Get a NEW record id
2: Get all text/string fields from the
record and put them in $orig_string
3: remove all common words like:
and,this etc from $orig_string
4: remove garbage in $orig_string
5:format and delimit all found words in $orig_string
6: Get a NEW record id, not the same as the first of course.
7: same as 2 put in $new_string
8: same as 3 ""
9: same as 4 ""
10: same as 5 ""
11: compare $old_string and $newstring
12: get word 1 from $orig_string and compare this with all found words in $newstring
13: found match ? (fuzzy OR 100%) remember the record id and maybe some other information cause we found a match !
14: get word 2 from $origstring
15: same as 13
16: etc etc, after doing all found words in $orig_string select a new record that isn't already processed
17: same as 1
Can you advise me with this approuch ?