want to parse a site with the PHP DOM-Document way: Note it is faster and easier to use. Some of you have convinced me!! One question - since i am a php-newbie ;-) can i apply the XPaths-code
Example: http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db.htm
Goal: to fetch the results ( approx 213 different records) too and parse them in order to get a database-dump for the saving on a local MySQL-Db!?
by the way: see two resultpages:
http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20302.htm http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20289.htm
You see there are lots of information stored...
well i have tried to do write a scraper with Perl - but i had no luck. Perl is for newbies very very hard. Afterwards i tired to write a parser in PHP - it is a bit easier. But the site (see the detail-resultpages) are a bit complex. How to parse them - in order to get the dataset for a locally based MySQL database. Then i have more opportunities for a retrieval. I want to get the datas to have them local (on my OpenSuse Linux System Version 11.3) in a MySQL-database.
well: i have three parts:
- fetching
- parsing
- storing (in MySQL: that is creating a MySQL-dump)
Since i have some very little experience with XPath i have a Xpather-Tool in my Mozilla-Browser. But i am not sure how i should apply them - see the data i gathered - below: Perhaps some of you can help me here - and show me how to apply them in a parsercode:
I love to hear from you
See here some details: for the results (from the approx 213 different records) - see two resultpages: - gathered some Xpath-datas:
Example: Bürgerstiftung Wiesloch http://buergerstiftungen.de/cps/rde/xchg/SID-A7DCD0D1-702CE0FA/buergerstiftungen/hs.xsl/db_20289.htm
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/p
Gründungsgeschichte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[1]/strong
Kurzvorstellung/Ziele /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[2]/span[2]/span/b
Projekte /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[3]/span[2]/span/strong
Kontakt: /html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/h6
well how to apply them in the Libxml - in order to get the PARSER-Part up and running!?