why would it be incorrect? to me it seems very logical!
The parameter tells the script to return the weather for Antwerp, and not the general weatherpage, like it would if you don't write the parameter!
Besides, that can't be wrong; var $buffer does contain all the necessary information!
it's really that regular expression that is wrong!

    I'd agree, since it should make a difference whether you include a local file or some content via http.

    Now, do you get any output at all? (I think you probably should?)

    What striked me most: are you sure eregi_replace is what you need? The way your pattern looks, you'd only replace <title>some text</title> with some text, leaving all the rest as it is.

    preg_match() or, for things that might be in there more than once, preg_match_all() might be more useful since you can read all matches into an array for later use.

      ok, this is what I've got now...

      $buffer = "<html><head><title>something</title></head><body>hello, this is some stupid text<!-- start forecast by LOC code (smb) --> this is the text I would like to extract <!-- end forecast by LOC code (smb) --> and this is some more stupid text <br><i>containing HTML and stuff, because I got it from a HTTP request</body></html>";
      
      echo htmlspecialchars($buffer);
      echo "<br><br><hr><br><hr><br><br>";
      // begin = "<!-- start forecast by LOC code (smb) -->"
      // end = "<!-- end forecast by LOC code (smb) -->"
      
      
      $bericht = preg_match_all("/.*start forecast(.*)end forecast.*/", "AAAAAA\\1", $buffer);
      echo htmlspecialchars($bericht);
      echo "<hr><hr>";
      preg_match_all('/<title>(.*?)title>/', $buffer, $aMatches);
      echo $aMatches[0];
      echo "<hr>";
      echo $aMatches[1];
      

      none of these work...
      The first one returns "0" so probably it wants to tell me there are no matches (there are...)
      The second returns me two times "Array" which makes me believe this is or not an array, or that array element doesn't exist...

      does it matter what is inside the .* I mean, if there is some code that could be understood like a regexp
      or anything else?

        with
        print_r(array_values($Matches));

        I found out I do have array entries, but it's all empty!
        Why?

          Hi again,

          this is from the manual:

          $matches[0] is the first set of matches, and $matches[0][0] has text matched by full pattern, $matches[0][1] has text matched by first subpattern and so on. Similarly, $matches[1] is the second set of matches, etc.

          (http://www.php.net/manual/en/function.preg-match-all.php)

          So, since there is probably only one title, $matches[0] should be an array containing the first match, where $matches[0][0] contains the complete match and $matches[0][1] contains what was matched by (.*?), and matches[1] would not be set at all.

            the problem is that it's all empty!
            probably meaning it's empty..

              Hi,

              what is this:
              preg_match_all("/.start forecast(.)end forecast.*/", "AAAAAA\1", $buffer);
              ???

              Mind the syntax:

              preg_match_all ( string pattern, string subject, array matches [, int flags])

              Thus, you are looking for your pattern in the string AAAAAA\1 and save the matches in $buffer, overwriting the former value of $buffer.

              I checked your code without that (and a little modified) and it's doing fine:

              $buffer = "<html><head><title>something</title></head><body>hello, this is some stupid text<!-- start forecast by LOC code (smb) --> this is the text I would like to extract <!-- end forecast by LOC code (smb) --> and this is some more stupid text <br><i>containing HTML and stuff, because I got it from a HTTP request</body></html>";
              
              preg_match('/<title>(.*?)title>/', $buffer, $aMatches);
              echo htmlspecialchars($aMatches[0]);
              echo "<hr>";
              echo htmlspecialchars($aMatches[1]);
              echo "<hr>";

                Hello ,

                First thing is change variable name buffer to any other name or imm. assign buffer value to other variable .
                second thing is preg_match_all is storing values in Multiple array not single array .

                Try following code its working ..

                $buffer = "<html><head><title>something</title></head><body>hello, this is some stupid text<!-- start forecast by LOC code (smb) --> this is the text I would like to extract <!-- end forecast by LOC code (smb) --> and this is some more stupid text <br><i>containing HTML and stuff, because I got it from a HTTP request</body></html>";
                $myvar = $buffer;
                echo htmlspecialchars($buffer);
                echo "<br><br><hr><br><hr><br><br>";
                // begin = "<!-- start forecast by LOC code (smb) -->"
                // end = "<!-- end forecast by LOC code (smb) -->"

                $bericht = preg_match_all("/.start forecast(.)end forecast.*/", "AAAAAA\1", $buffer);

                echo htmlspecialchars($bericht);
                echo "<hr><hr>";
                //preg_match_all('/<title>(.?)title>/', $buffer, $aMatches);
                preg_match_all("/(<([\w]+)[>]
                >)(.*)(<\/\2>)/",$myvar, $aMatches);

                for ($i=0; $i< count($aMatches[0]); $i++)
                {
                echo "Matched text: ".$aMatches[0][$i]."<br>";
                echo "part 1: ".$aMatches[1][$i]."<br>";
                echo "part 2: ".$aMatches[3][$i]."<br>";
                echo "part 3: ".$aMatches[4][$i]."<br><br>";
                }

                  uhmmmmm

                  didn't worked...I still get an empty array with 4 indexes!

                  preg_match_all("/(<([\w]+)[>]> )(.)(<\/\2> )/",$HTML_file, $aMatches);

                  for ($i=0; $i< count($aMatches[0]); $i++)
                  {
                  echo "Matched text: ".$aMatches[0][$i]."<br>";
                  echo "part 1: ".$aMatches[1][$i]."<br>";
                  echo "part 2: ".$aMatches[3][$i]."<br>";
                  echo "part 3: ".$aMatches[4][$i]."<br><br>";
                  }
                  print_r(array_values($aMatches));
                  ?>

                  because I was getting an empty page with the number of matches, I tried to print all the array values, showing me I got an empty array as a result...

                    could it be I have problems because there are newlines inside that HTML code?
                    because I can extract the title, I can extract from
                    "start forecast" till "LOC code" (inbetween is only "by")
                    but when I try from "start forecast" till "end forecast"
                    I have a problem!

                      if someone wonders what the problem was:
                      it needed a /s modifyer in the back, to span over multiple lines!

                      I found it using one of the sites from the usercomments in the php manual!
                      it's perl, but I got a solution now

                        11 days later

                        This is a very good reply I got in a newsgroup a while ago, might help people who do the same 😉
                        it's solved now!!!

                        I had a doozy of a time matching HTML from pages for a long time, and there
                        were 3 things that helped me.

                        1. Use non-greedy .'s.. .? is your friend.
                        2. if you have to match over multiple lines, use 'ms' --> s/fjk(.*?)kd/ms
                        3. Normalize the webpage before you match against it, here is what I
                          commonly do:
                          function normalize_page ( &$page ) {
                          $page = preg_replace( "/\r/", "", $page); <-- Strip out
                          carraige returns
                          $page = preg_replace( "/\n/", "", $page); <-- Strip out new
                          lines returns
                          $page = preg_replace( "/\t/", "", $page); <-- bye bye tags
                          //$page = preg_replace( "/ /", " ", $page); <-- Use this with
                          some caution.
                          $page = preg_replace( "|</(.*?)>|", "</$1>\n", $page); <-- This breaks
                          up tags onto new lines
                          $page = preg_replace( "|><|", ">\n<", $page); <-- This also
                          breaks up things.
                          }

                        Normalizing a page before reg'exing it can also help when html formats
                        change "just a little", a space here, or a tab/newline there, and your regex
                        is broken. However, normalizing the data first helps to keep your regexes
                        functioning.

                        Just my $.02.

                        --Brian

                          Write a Reply...