Parsing custom tags/markup and blocks

Anon

I'm looking for a SMART way to parse custom tags out of a HTML document (string). The tags will have the basic XHTML format.

<tag attribute="value"/>
<tag attribute="value>stuff including HTML</tag>

What I want end up with the attributes & values of the tags and the content "between" the tags. I know I can do this using regexp, but I don't know it very well, and the learning process gives constant headaches...

Some of you have probably done this before, and can tell me if I'm stuck with regexp. Is there a more efficient (easy 🙂 way of producing this result? Any pointers to examples?

btubalinal

why not just parse the file as if it was xml? then you can use expat or domxml and grab the tags and attributes you need.

expat

http://www.php.net/manual/en/ref.xml.php

DOM XML

http://www.php.net/manual/en/ref.domxml.php

squashee

Thanx, sound like a reosonable way, but i guess it means the source has to be well formed XHTML. I'm unsure about one thing though; can i get the "contents" of one node in text form, including the tags?