Hello. I need a function that will be close opened, but not closed HTML tags. I do something, but my function just add the closing tags at the end of variable, which usually is not compatible with the W3C (and must be). Function should close the tag before the advent of the next tag, such as when to introduce the function code:

<B>Hello! I invite you to my page about <U> games, it is best <I> page!</I> 

This function should return:

<B>Hello! I invite you to my page about </B><U> games, it is best </U><I> page!</I> 

What matters is that it was consistent with the W3C.

I present below the code of my function and I hope that anyone of you comes to mind how to fix this problem.

function repair_html($str) {
	# Define arrays
	$tags_open=Array('<b>','<i>','<u>','<a','<p');
	$tags_close=Array('</b>','</i>','</u>','</a>','</p>');	
	$tags_open_num=Array();
	$tags_close_num=Array();
	$tags_status=Array();

# Count opened tags
foreach($tags_open as $id => $tag) {
	$tag_big=strtoupper($tag);
	$tag_small=strtolower($tag);
	$count_big=substr_count($str,$tag_big);
	$count_small=substr_count($str,$tag_small);
	$count=$count_big+$count_small;

	$tags_open_num[$id]=$count;		
}

# Count closed tags
foreach($tags_close as $id => $tag) {
	$tag_big=strtoupper($tag);
	$tag_small=strtolower($tag);
	$count_big=substr_count($str,$tag_big);
	$count_small=substr_count($str,$tag_small);
	$count=$count_big+$count_small;

	$tags_close_num[$id]=$count;		
}

# Count opened but not closed tags
foreach($tags_open as $id => $tag) {
	$no_closed=$tags_open_num[$id]-$tags_close_num[$id];
	$tags_status[$id]=$no_closed;
}

# Close not closed tags
foreach($tags_open as $id => $tag) {
	$status=$tags_status[$id];
	echo '<b>Tag status '.htmlspecialchars($tag).' is'.$status.'</b><BR>';
	if($status>0) {
		$opened_num=$tags_open_num[$id];
		echo '- Opened tags: '.$opened_num.'<BR>';
		$close_tag=$tags_close[$id];
		echo '- Closing tag: '.htmlspecialchars($close_tag).'<BR>';
		for($i=$status;$i<=$opened_num;$i++) {
			$str.=$close_tag;
		}
	}
}

return $str;
}

$html='<b>Hello! How are you? I play<i> new computer game!';
$html_new=repair_html($html);

echo 'Old code:<BR>';
echo '<textarea rows="10" cols="100">'.$html.'</textarea><BR>';

echo 'New code:<BR>';
echo '<textarea rows="10" cols="100">'.$html_new.'</textarea><BR>';

I am sorry for my english, but I am polish.

Greetings,
ladovnik

PS This 'echo' <b>Tag status... " etc. is only for testing and fault detection.

    For W3C compliance in general there are two things to look at:

    The [man]tidy[/man] extension can be used to correct invalid HTML.

    Another possibility is to start with something like

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    echo $dom->saveHTML();

    The main problems with both though are that they'd turn your example into

    <b>Hello! I invite you to my page about <u> games, it is best <i> page!</i></u></b>

    (plus extra cruft to make the whole thing a valid document). That's closer to what I'd expect to happen, honestly - I wouldn't expect starting underlining to turn off bold (because then I'd expect starting bold to turn off underlining and then how would I have both?).

      It does not matter how it will look a revised code, but it must be compatible with the W3C, because my School Home Page is go to the competitions, and teachers dont remember to close html tags in news content, so i must do funtion to repair it.

        how are they adding the material?, providing them with a HTML WYSIWYG editor like tinyMCE, should result in valid html

          Yes, I know, but late the School Home Page made someone else, and today there are very news (more than 100), and School IT Teacher asked me to made function, which repair html in news when viewing (echo repair_html($news_content)😉.

            Is any other idea? I'm not sure if in my school server is Tiny class activated in PHP...

              Thanks very much, Weedpacket! Class DOMDocument really work!

              Greetings,
              ladovnik

                Write a Reply...