Parsing RSS / RegEx n00b

ShipiboConibo · Jul 10, 2009

I must confess that I'm behind the curve on my regular expression syntax. I
am trying to just take an RSS feed and parse the info I want from it using
preg_match_all() (I know there are more proper ways and even PEAR libraries
to this, but I just need to keep things simple. So far it's working great
when I do the following:

preg_match_all("/(<title>)(.*)(<\/title>)/", $rss, $titles, PREG_SET_ORDER);

The problem comes up with one of the tags I want to parse called
'<content:encoded>'. It seems the colon is throwing everything off, and I've
poked around google for regex info trying to make this work, but no luck. I
know this must be something simple I just don't understand about regex...

Can anyone help me with this? For bonus points, how would I then parse out
the image tags from the <content:encoded> section of the feed? Being that
img tags don't use closing tags in html. My end goal is to get the image
html code from this data.

Thanks!
-Adam

nrg_alpha · Jul 10, 2009

For RSS parsing, I think I would consider simplexml instead of regex (you'll have to do your homework, as personally, I'm not well versed with that). My understand is that simplexml is built for this kind of thing.

On a regex side note... beware of using . as opposed to .? , as this can have ramifications with regards to speed and even worse, results (as a rule, .* or .+ is frowned upon for those very reasons).

Parsing RSS / RegEx n00b

SShipiboConibo

Nnrg_alpha