Address Matching algorithm(s)
Results 1 to 5 of 5

Thread: Address Matching algorithm(s)

  1. #1
    Senior Member
    Join Date
    Sep 2002
    Posts
    611

    Address Matching algorithm(s)

    I need to be able to match addresses as closely as possible.
    This is a list of addresses in a table that I need to match (CSV format)
    I need to be able to match user input to match the record and obtain the
    several column values.
    Code:
    "precinct_id","precinct","street_name","number_range","register_city","mail_city","city_ward","zip_code","voter_numbers","congressional_district","senatorial_district"
    0,"0015A","NE 45TH AVE","918-1018","City of Ocala","OCALA",15,"34470",4,8,7
    1,"0015A","NE 45TH CT","501-627","City of Ocala","OCALA",15,"34470",39,"8,7
    2,"0015A","NE 45TH TER","33-597","City of Ocala","OCALA",15,"34470",59,"8,7
    3,"0015A","NE 46TH AVE","721-1010","City of Ocala","OCALA",15,"34470",2,8,7
    4,"0015A","NE 46TH CT","702-1237","City of Ocala","OCALA",15,"34470",56,8,7
    5,"0015A","NE 46TH RD","1400-1729","City of Ocala","OCALA",15,"34470",35,8,7
    6,"0015A","NE 47TH AVE","1400-1749","City of Ocala","OCALA",15,"34470",13,8,7
    7,"0015A","NE 47TH CT","301-321","City of Ocala","OCALA",15,"34470",5,8,7
    8,"0015A","NE 47TH CT","314-320","City of Ocala","OCALA",15,"34470",6,8,7
    9,"0015A","NE 47TH CT","410-530","City of Ocala","OCALA",15,"34470",16,8,7
    10,"0015A","NE 47TH CT","1427-1730","City of Ocala","OCALA",15,"34470",38,8,7
    11,"0015A","NE 48TH AVE","310-615","City of Ocala","OCALA",15,"34470",23,8,7
    12,"0015A","NE 49TH AVE","1520-2001","City of Ocala","OCALA",15,"34470",51,8,7
    13,"0015A","NE 4TH PL","4006-4015","City of Ocala","OCALA",15,"34470",7,8,7
    14,"0015A","NE 4TH ST","3709-4588","City of Ocala","OCALA",15,"34470",66,8,7
    I'm unsure of how to proceed in getting these values.
    As you know, the user can input their address in a number of different ways,
    i.e.
    NE 45th Court
    Ne 45th CT
    NE 45th CT
    NE 45TH CT
    NE 45TH COURT
    etc...

    There are literally thousands of records, oh, and not all of them are going to
    have numbers in the street names

    I originally thought about getting the min/max number range and checking the
    house number to see if it is in that range then returning only the streets
    where the user's number is in the range and then using soundex on those
    returned values. Here is the SQL to narrow it down to the rows that contain
    the house number:

    Code:
    select * from precincts where 410 BETWEEN
    convert(substr(number_range, 1, locate('-', number_range)-1), signed) AND
    convert(substr(number_range, locate('-', number_range)+1), signed)
    I know about full-text searching on mySQL and using soundex, levenshtein,
    metaphone, and similar_text in PHP, however, I do not know ultimately which
    database they will eventually be using. I am creating a mock-up. I am using
    mySQL on the mock-up but they may chose to not use it. I was looking for
    algorithms that used more PHP to do the searching than the SQL as I am not
    sure how to do this using any of the PHP functions mentioned above..

    Can anyone help out here, I'm lost on how to continue?

  2. #2
    Pna lbh ernq guvf¿
    Join Date
    Jul 2004
    Location
    Kansas City area
    Posts
    19,414
    Quote Originally Posted by mrbaseball34
    I was looking for
    algorithms that used more PHP to do the searching than the SQL
    Whatever you come up with doing it that way, it's most likely going to perform poorly (in some cases extremely poorly) as compared to a solution that allows the DB to do what it does best - manage and retrieve the data you want.

    I was going to suggest using fulltext searching and ordering by relevance to allow the user to select the correct address. Also, these:
    Code:
    Ne 45th CT
    NE 45th CT
    NE 45TH CT
    are all the same, assuming you're using an encoding ending in _ci (which I don't see why you wouldn't... "NE 45th CT" isn't a different street than "NE 45TH CT" ... right?).

  3. #3
    Senior Member
    Join Date
    Sep 2002
    Posts
    611
    The problem I'm running into is that I can't get mySQL to set the full_text search min chrs to 2.
    I am using WAMP and change it in mysql.ini but it still won't do 2 chars.

    If I enter 'NE' it won't match anything, if I try matching 'PL' or 'CT' it won't match anything.
    But if I try matching '46TH' it returns every 46TH record. But, if I try '46TH PL', it ALSO returns every 46TH record. Why would that be?

    FT searching is a difficult dog to judge...

  4. #4
    Pna lbh ernq guvf¿
    Join Date
    Jul 2004
    Location
    Kansas City area
    Posts
    19,414
    Having never used WAMP, I'm unfamiliar with how it installs things. I'm assuming the "mysql.ini" you're referring to is analogous to the "my.ini" file MySQL installs by default? If so are you positive that the MySQL daemon is parsing this file?

    Also, can you show us the changes you made to the mysql.ini file (and where in the file you made them) to attempt to change the min word length?

    Finally, after you changed this value, did you pay attention to this key fact mentioned in the MySQL manual:
    he default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes.
    ?

  5. #5
    Senior Member
    Join Date
    Sep 2002
    Posts
    611
    I'm assuming the "mysql.ini" you're referring to is analogous to the "my.ini" file MySQL installs by default?
    Yes, I was mistaken.

    I added the value at the bottom, right after these lines:
    Code:
    [mysqld]
    port=3306
    ft_min_word_len = 2
    Yes, I also rebuilt the index after restarting mySQL service after changing the ini fie.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •