hi all
i am about to embark on a pattern matching journey. I'm hoping that someone can recommend which strains of pattern matching function are the best weapon to deal with my file issues. Perl-compatible or POSIX extended?? what's the difference?
my fleet is 4 or 5 directories full of handsome and seaworthy HTML files. I need to bravely steer them to new directories every bit as good as their old ones and named indentically.
During this journey from an old folder named /src/foo, a typical html file will move to another folder named /output/foo and will undergo the following changes which will strip out headers and footers while retaining critical javascript and css references.
1) CSS file references in the header must be extracted and put into an array.
2) any javascript in the header must be extracted and saved in a variable. this variable must be cleansed of certain javascript functions which i will be including on every page as a js file reference in the html.
3) a fairly headers and footers must be removed. there are 5 varieties of this header/footer pairs with different image files but consistent formatting--i'm nearly certain headers all end with <td background="images/1bg.gif">
footers all begin with:
</td>
</tr>
<tr>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td height="14" background="images/one_of_5_images.gif">
4) all links internal to the site will be changed from .html to .php. external site references will be unaffected.
There are more things i will need to tweak but i figured this is plenty to start with. if done right, this script will save me weeks of painful and messy hand coding.
any advice would be greatly appreciated! the pattern matching functions confuse me enough and i'd like to start off on the right foot.