Has anyone created a filter for DMOZ content? I am building a search engine, using the DMOZ ODP RDF (lot of acronyms) data.

I want to make a web site search engine for a portal, which has a very specific audience (don't go wild here, it's a boring type of audience ;-)

In order to do it, I want to use the DMOZ data, but I want to filter the results for relevance to my audience.

The trouble is I don't know exactly where to start.

Has anyone created a filter for DMOZ data, or does anyone have some information to share on how one goes about filtering results?

Is it based on keywords? Categories? Is there some relevance meter in the data which I can harness?

TIA,
Jeff

    The data is XML formatted. If you want to build a parser you'll need to study the W3C spec to see what all the tags mean. Then check out the expat extension.

    Wish I could be more helpful but it sounds like a pretty big dataset to try to tackle directly with PHP. My suggestion:

    1. Build a script that parses the file into a database.
    2. Use the database to perform your queries.

    From what I can tell about the format it should parse easily into a simple database format which you can search using LIKE clauses.

    Of course, if someone's already created a script to do that then you're all set 🙂

      Write a Reply...