I need to be able to match addresses as closely as possible.
This is a list of addresses in a table that I need to match (CSV format)
I need to be able to match user input to match the record and obtain the
several column values.
"precinct_id","precinct","street_name","number_range","register_city","mail_city","city_ward","zip_code","voter_numbers","congressional_district","senatorial_district"
0,"0015A","NE 45TH AVE","918-1018","City of Ocala","OCALA",15,"34470",4,8,7
1,"0015A","NE 45TH CT","501-627","City of Ocala","OCALA",15,"34470",39,"8,7
2,"0015A","NE 45TH TER","33-597","City of Ocala","OCALA",15,"34470",59,"8,7
3,"0015A","NE 46TH AVE","721-1010","City of Ocala","OCALA",15,"34470",2,8,7
4,"0015A","NE 46TH CT","702-1237","City of Ocala","OCALA",15,"34470",56,8,7
5,"0015A","NE 46TH RD","1400-1729","City of Ocala","OCALA",15,"34470",35,8,7
6,"0015A","NE 47TH AVE","1400-1749","City of Ocala","OCALA",15,"34470",13,8,7
7,"0015A","NE 47TH CT","301-321","City of Ocala","OCALA",15,"34470",5,8,7
8,"0015A","NE 47TH CT","314-320","City of Ocala","OCALA",15,"34470",6,8,7
9,"0015A","NE 47TH CT","410-530","City of Ocala","OCALA",15,"34470",16,8,7
10,"0015A","NE 47TH CT","1427-1730","City of Ocala","OCALA",15,"34470",38,8,7
11,"0015A","NE 48TH AVE","310-615","City of Ocala","OCALA",15,"34470",23,8,7
12,"0015A","NE 49TH AVE","1520-2001","City of Ocala","OCALA",15,"34470",51,8,7
13,"0015A","NE 4TH PL","4006-4015","City of Ocala","OCALA",15,"34470",7,8,7
14,"0015A","NE 4TH ST","3709-4588","City of Ocala","OCALA",15,"34470",66,8,7
I'm unsure of how to proceed in getting these values.
As you know, the user can input their address in a number of different ways,
i.e.
NE 45th Court
Ne 45th CT
NE 45th CT
NE 45TH CT
NE 45TH COURT
etc...
There are literally thousands of records, oh, and not all of them are going to
have numbers in the street names 😉
I originally thought about getting the min/max number range and checking the
house number to see if it is in that range then returning only the streets
where the user's number is in the range and then using soundex on those
returned values. Here is the SQL to narrow it down to the rows that contain
the house number:
select * from precincts where 410 BETWEEN
convert(substr(number_range, 1, locate('-', number_range)-1), signed) AND
convert(substr(number_range, locate('-', number_range)+1), signed)
I know about full-text searching on mySQL and using soundex, levenshtein,
metaphone, and similar_text in PHP, however, I do not know ultimately which
database they will eventually be using. I am creating a mock-up. I am using
mySQL on the mock-up but they may chose to not use it. I was looking for
algorithms that used more PHP to do the searching than the SQL as I am not
sure how to do this using any of the PHP functions mentioned above..
Can anyone help out here, I'm lost on how to continue?