I am trying to import a glossary from an MS Word table into mySQL, and I want to preserve formatting. I mean, PHP has to be able to print the entries with bold, italic and list items like they appear in the original document. So far I've been perfectly able to import single-line entries, through a toilsome method which is the only one I've been able to figure out:
- convert the glossary into HTML;
- apply regular expressions, like this:
Search:
<TR><TD WIDTH="31%" VALIGN="TOP">\p<FONT FACE="Arial"><P>(.)</FONT></TD>\p<TD WIDTH="62%" VALIGN="TOP">\p<FONT FACE="Arial"><P>(.)</FONT></TD>\p<TD WIDTH="8%" VALIGN="TOP">\p<FONT FACE="Arial"><P>Glossary's name</FONT></TD>\p</TR>\p
Replace with:
\p(null, '\1', '\2', 'Glossary's name'),
That is supposed to generate many lines that can be INSERTed into my database. It works just fine with single-line entries, but it doesn't when I have two lines in either column and it gets even worse when the entry contains a list, like:
<TR><TD WIDTH="31%" VALIGN="TOP">
<FONT FACE="Arial"><P>order</FONT></TD>
<TD WIDTH="62%" VALIGN="TOP">
<FONT FACE="Arial"><P>Ordem, pedido (de mercadoria), encomenda. Estado, condição.</P>
<UL>
<B><LI>In good order</B>: em bom estado</LI></UL>
<P>Também ordem de pagamento.</P>
<UL>
<B><LI>Postal order</B>: ordem de pagamento.</UL>
</FONT></TD>
<TD WIDTH="8%" VALIGN="TOP">
<FONT FACE="Arial"><P>Glossary's name</FONT></TD>
</TR>
Can any regular expression big ace here suggest a regex that will accommodate an unpredictable number of lines in each entry? I've spent the whole day working on it and can't find the magic formula.
Alternative methods to what I am trying to achieve are very welcome too.
Thank you,
Luciano ES
Santos, SP - Brasil