First things first... everyone that helps me do this can have the end results if they want.
Now.... how do I setup SuckDMOZ from Sourceforge? I cannot seem to figure out how to install php on windows xp and run things from command line. How do I install SuckDMOZ? How do I install PHP on Windows XP to run commands from command line? PLEASE DONT POINT ME TO PHP.NET AND SAY READ THIS!!! PLZ!!
I will parse the entire dmoz dump files on my computer using my servers. If someone can help me figure this out I will gladly give everyone, that is interested, the parsed mysql database in either mysql dumps or csv format.
Just tell me how to set this thing up and specify what sections (if not all) you want and in what format... I will take it from there... may need to ask about converting if I dont know how.. but I am certainly willing to try and help!
Also... how do purge a database of duplicate entries (i dont have a unique id column so that wont get in the way) I cannot just copy the entire insert commands because i have over 600,000 entries.. I want a single entry for every url in my database.
Here is the structure:
CREATE TABLE dbsearch (
Title varchar(255) default NULL,
URL varchar(255) default NULL,
Description varchar(255) default NULL,
Category varchar(255) default NULL,
E_Mail varchar(255) default NULL,
Status varchar(255) default NULL,
FULLTEXT KEY Category (Category),
FULLTEXT KEY description (Description),
FULLTEXT KEY title (Title)
) TYPE=MyISAM;
Heres a sample:
INSERT INTO dbsearch VALUES ('Exokernel: An Operating System Architecture for Application-Level Resource Management', '"http.cs.berkeley.edu/~gribble/osprelims/summaries/exokernel.html"', 'Summary of paper mentioning overview and design issues.', 'Submicrokernel', NULL, NULL);