XPath variable help - to get DOM-Document up and running

bernard_hinault · May 29, 2011

want to parse a site with the PHP DOM-Document way: Note it is faster and easier to use. Some of you have convinced me!! One question - since i am a php-newbie ;-) can i apply the XPaths-code

Example: http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db.htm

Goal: to fetch the results ( approx 213 different records) too and parse them in order to get a database-dump for the saving on a local MySQL-Db!?

by the way: see two resultpages:

http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20302.htm http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20289.htm

You see there are lots of information stored...

well i have tried to do write a scraper with Perl - but i had no luck. Perl is for newbies very very hard. Afterwards i tired to write a parser in PHP - it is a bit easier. But the site (see the detail-resultpages) are a bit complex. How to parse them - in order to get the dataset for a locally based MySQL database. Then i have more opportunities for a retrieval. I want to get the datas to have them local (on my OpenSuse Linux System Version 11.3) in a MySQL-database.

well: i have three parts:

fetching
parsing
storing (in MySQL: that is creating a MySQL-dump)

Since i have some very little experience with XPath i have a Xpather-Tool in my Mozilla-Browser. But i am not sure how i should apply them - see the data i gathered - below: Perhaps some of you can help me here - and show me how to apply them in a parsercode:

I love to hear from you

See here some details: for the results (from the approx 213 different records) - see two resultpages: - gathered some Xpath-datas:

Example: Bürgerstiftung Wiesloch http://buergerstiftungen.de/cps/rde/xchg/SID-A7DCD0D1-702CE0FA/buergerstiftungen/hs.xsl/db_20289.htm

/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/p

Gründungsgeschichte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[1]/strong
Kurzvorstellung/Ziele /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[2]/span[2]/span/b
Projekte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[3]/span[2]/span/strong

Kontakt: /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/h6

well how to apply them in the Libxml - in order to get the PARSER-Part up and running!?