Well, I have done this a couple of times, and usually it turns out to be a living nightmare. Unless whoever you're grabbing headlines from is offering them for grabbing, it can be pretty hard.
You write some sort of application that connects to the webserver you are grabbing headlines from, retrieves a document and saves it. Then you have to write a script (perl/sed/awk is pretty good for this) that determins where the headlines are located and does whatever you want with the data.
Be aware that this probably includes a considerable amount of work. I have also found that using lynxs -dump option can help you out, e.g. fetch html document, then
lynx -dump <document.html> > rendered.txt
or similar. This might help your script determine the headlines actual text, and it can be easier to fetch data from the original html file (And hyperlinks in -dump mode are listed at the end, can be very useful as well).
What does piss one off is when you've spent three weeks making this work, and they suddenly change site layout 🙂