What are you actually trying to achieve? What's the purpose of this?

    i am making site where users can register and create html/wml pages, and have options to edit them. one of edit options must be UNITS edit, so i need to divide html code of that file in units.

    that's why i asked for help. it is PROBABLY possible with preg_replace, so please, if somebody can help....

    theese are tags which need to be recognized and divided into units:

    <a href=""> </a> - UNIT LINK
    <img src="" alt=""> - UNIT IMAGE
    <p> </p> - UNIT PARAGRAPH
    <br/> - UNIT BREAK

    and when there is some text(without tags), it should be recognized as UNIT TEXT.
    if tag from html file is not listed above, it should be shown as UNKNOWN UNIT.

    thats it....

      how are are defining UNIT it does not exist in the specs in the way you are using the word

        ok, my english is bad, and i am confusing. i will try to explain it more simple.

        i need php script which will open html file located on same server ...fopen("test.html") ..

        then i need to read line by line and if php script find in html file, for example

        <a href="http://google.com">GOOGLE</a>, i want to print that in next format:

        LINK: google(http://google.com)

        so, script need to recognize it is link and print in format above.

        i need same for <img> tag, <p>, <br>.... if script find BR in text it will show BREAK LINE

        i did that for A HREF

           if(preg_match('/<a href=\"(.*?)\">.*?<\/a>/i', $buff2))
           	   							{
              								$buff2 = preg_replace('/<a href=\"(.*?)\">(.*?)<\/a>/i', "\n".'LINK: $1 ($2)', $buff2);
        								}
        

        but, i tried to do for <img> and it doesn't work .

           elseif(preg_match('/<img src=\"(.*?)\" alt=\"(.*?)\"\/>/i', $buff2))
           	   							{
              								$buff2 = preg_replace('/<img src=\"(.*?)\" alt=\"(.*?)\"\/>/i', "\n".'IMAGE: $1', $buff2);
        								}
        

        also, when there is only text, which is not located between some tags, it need to be recognized as TEXT: part of text(10 characters for example).

              $buff2 = preg_replace("/(.+)\n/i", 'TEXT: $1' . "\n", $buff2);
        
        

        problem is that, when there is empty line in html file, php script show it as TEXT: and there is nothing....

        maybe now it's easier to understand.

          GoRide! wrote:

          can i load wml file using dom->loadXML(); ?

          Since WML is an application of XML, the answer is yes.

          Surprisingly, the DOM extension can also read HTML, even though HTML is not XML.

            hm okay...but, can any body tell me what's the problem with this code

             elseif(preg_match('/<img src=\"(.*?)\" alt=\"(.*?)\"\/>/i', $buff2))
                                                  {
                                                  $buff2 = preg_replace('/<img src=\"(.*?)\" alt=\"(.*?)\"\/>/i', "\n".'IMAGE: $1', $buff2);
                                            } 

              Apart from the fact that you're not using DOM? 🙂

              The most obvious thing to note is that the preg_match() test gains you nothing except having to do everything twice if there is a match.

              It also requires

              1. both src= and alt= attributes

              2. in that order

              3. no other attributes

              4. a single space between the "img" and the start of the src= attribute

              5. double-quoted attribute values

              6. no spaces around the '=' separating either attribute name from its value

              7. a single space between the ending quote on the src= attribute and the start of the alt= attribute

              8. no space after the end of the alt= attribute

              9. nothing between the end of the alt= attribute and the / at the end of the tag

              10. and a / at the end of the tag.

              If any of these conditions are not met then the pattern will not match.

              Apart from that, there's nothing wrong with the pattern (the double quotes don't need to be escaped as they're not considered significant, but since they aren't significant the extra escapes get ignore anyway). Obviously it won't work if $buff2 is the wrong variable to begin with, but like I said, that's obvious.

                Ok, so, i need to work with php DOM?
                is there any example how to do this with dom?

                  $doc = new DOMDocument();
                  $doc->loadHTMLFile('source.html');
                  
                  $images = $doc->GetElementsByTagName('img');
                  while($images->length)
                  {
                  	$image = $images->item(0);
                  	$label = "IMAGE: ".$image->getAttribute('src');
                  	$image->parentNode->replaceChild($doc->CreateTextNode($label), $image);
                  }
                  
                  $doc->saveHTMLFile('target.html');

                  Of course, that's only an example. The real thing would no doubt require a bit of thought.

                    hm, but how can I get atributes for all tags?

                    i am selecting all tags from html file with

                    $tags = $doc->GetElementsByTagName('*');
                    

                    problem is that i cannot select only images or some tag, it must be sorted as in file.

                    if first line in file is <p>bla</p>, it must be first in "unit" list.

                    any help?

                      The list of elements returned by GetElementsByTagName is in document order.

                      Now that you've got a list of all the elements in the document, the obvious thing to do would be to go through each element of the list and, depending on what sort of element it is, do something appropriate.

                        sorry, i bother

                        but, if you have time please write example for working with some tag (no matter which), when all tags are loaded?

                          $doc = new DOMDocument();
                          $doc->loadHTMLFile('test.html');
                          
                          $elements = $doc->GetElementsByTagName('*');
                          $length = $elements->length;
                          for($i=0; $i<$length; ++$i)
                          {
                          	$element = $elements->item($i);
                          	switch($element->tagName)
                          	{
                          	case 'img':
                          		echo "An image, src = ".$element->getAttribute('src')."\n";
                          		break;
                          	case 'p':
                          		echo "A paragraph\n";
                          		break;
                          	case 'br':
                          		echo "Break\n";
                          		break;
                          	}
                          }
                          

                            thank you very much!!!! you have made my day 😃

                              uhh, forgot something!
                              how do I recognize clear text with no tags??

                                it's going good for me...
                                is it possible when i select some tag to get all html code between that tag?

                                for example, if i getElementsByTagName("form"), can i get code between <Form> and </form>, ....? all inputs....

                                  See the user notes on the [man]DomElement[/man] page.

                                    ok,ok,here we go again... i am looking 4 hours to do this, and i don't know how
                                    i have code

                                    <a href="http://google.com"><font size="5">google link</font></a> 
                                    

                                    i made that you can change href,link title, font size...but, when there is only

                                    <a href="http://google.com">google</a>
                                    

                                    how can i add <font size="$x"> </font>, and current link title beewteen font tags? i made that, but all code <font size=x> link title </font> is recognized is node, i need to separate it because of later change...

                                      15 days later
                                      Write a Reply...