Thanks for the input, guys.
Weedpacket;11046425 wrote:The second question is more straightforward to answer than the first so I'll answer it first: it refers to the version of XML used (1.0 or 1.1).
Thanks for the links here. I think it's curious that DOMDocument has both a loadHTML method and also a loadXML method and both seem to choke on this HTML source file I'm working with. The loadXML method throws a LOT more errors than the loadHTML method does which seems backwards to me. I also find it bothersome that all these parse errors appear as E_WARNING notifications, thereby sneaking their way into the script output. Seems it would be better to collect such warnings in an array or something rather than barfing them out to STDIO or STDERR.
loadHTML complains about:
a <head> tag before the opening <body> tag, which makes sense ("htmlParseStartTag: misplaced <head> tag in Entity")
a <header> tag, which is valid HTML5 unless I'm mistaken ("Tag header invalid in Entity")
a <nav> tag (also valid HTML5?? --- "Tag nav invalid in Entity")
a <section> tag (also valid HTML5?? --- "Tag nav invalid in Entity")
* a <footer> tag (also valid HTML5?? --- "Tag nav invalid in Entity")
This makes me think that DOMDocument->loadHTML leaves something to be desired as far as parsing valid HTML documents. These errors seem entirely unrelated to choice of character set.
Weedpacket;11046425 wrote:Pedantic niggle: the HTML5 spec defines two syntaxes for HTML5 documents, the HTML-based "HTML" syntax and the XML-based "XHTML" syntax. You might want to see which one you're producing but unless you're getting fancy with namespaces the most you'd need to do is check the doctype. (I don't actually know the situation re: the DOM extension's support for HTML5).
Hahaha which one I'm producing bwahahhaHAHAHA. No this is someone else's formerly-wordpress site which some genius decided to export to static HTML rather than battle wordpress. I'm expecting it will be my job to turn it back into some kind of dynamically generated site at some point.
For the love of GOD this microsoft-breaking-**** saga is still going on? When will it end?
Weedpacket;11046425 wrote:It was introduced in IE5, and I think discontinued in IE10 - (not to mention that IE11 or maybe 12 is supposed to be the last version of Internet Explorer - but it wouldn't be the first time Microsoft has announce the end of life for something that they then kept going).
DIE DIE DIE DIE DIE.
Bonesnap wrote:This wouldn't happen to be a WordPress site, would it? That code looks pretty much identical to a lot of the default themes' headers. I normally nuke about 95% of the default crap and use my own, partially for this reason.
You hit the nail on the head. The decision was made that WP presented a security problem so the dynamic website was sacrificed and mummified into static HTML. Brilliant!, don't you think?
Bonesnap wrote:And yeah, according to Microsoft, the conditional comments are ignored by IE10 and later. IE11 is the last version of Internet Explorer and their new browser, "Spartan", will be its successor. I have a hunch they're going to stick with the name Spartan much like how they stuck with Windows 7.
You'd think with that $40B cash hoard MS used to have that they could have written a decent browser without ruining the internet for everyone.