You realize that Google news has an RSS and ATOM feed right?
RSS: http://news.google.co.nz/?ned=nz&topic=n&output=rss
Atom: http://news.google.co.nz/?ned=nz&topic=n&output=atom
If you're talking about running through a site to get links of news and harvesting data, then you'd have to have an idea of what that sites HTML is like, and do a regex against it to get the data you want. Typically this is very intensive and if they change their HTML, your script won't work. This is why it's better to ask for an RSS or Atom feed.
If you want to read up on it, I suggest you read up on harvesting data. I wrote an article ages ago about picking up data from weather.com here: Simple Remoting. It may or may not help you.