Hi guys, now this is a very good thread we must all contribute to, very interesting indeed. This house has save my skin a lot of time, and im devoting my time to starting interesting threads for everyone to share. so lets get started.
Im to do a html parser, meaning i could extract what piece of information i like from its source code. This is the sccene, i have a directory named 'html' where all html/htm files are stored. lets say where working on resumes that are on html formats, wherein there, all information such as age, work experiences, name, address, etc are placed. What im todo is that i can save the fields i want from the html to the database.
the clue here is that we should have to know certain patterns in order to fetch those information that we need, for example, the resume may hole these pattern:
<b>Name : </b> Raymund
so the key here is to search for the part of the code with '<b>Name : </b>' then extract the next part, until such time we encounter an ending pattern, like maybe a </td> tag is its in a table.
My main problem as of the moment is that i would like to store the source code of the html file to a variable, so that i could do substr with it, or do you have other ideas of doing it?
any idea you may contribute would be VERY much appreciated, i have a sample code parser which im not to redistribute, or else i may get fired and be jailed, its all talking about classes, it full of -> things, im having a headache. Ill be starting to study that code tomorrow, but im entitled of doing it my way/style, eventually, the easier, dumb, way of doing it.
Hope everyone help here, its critical.hehe