R
ripat

  • Feb 26, 2005
  • Joined Dec 4, 2004
  • The way the heredoc syntax works on my server requires a line feed between the end marker and the semi column

    Not working:

    // heredoc
    print <<<HTML
    ...
    anything here.... html code or php variables
    ....
    HTML;

    Working:

    // heredoc
    print <<<HTML
    ...
    anything here.... html code or php variables
    ....
    HTML
    ;

    Hope it helps.

    • This is what I use to make string replacement outside html tags.

      // function called by preg_replace_callback
      function mon_rplc($capture){ 
        $smiley = array(':)',':|',':D',':p'); 
        $icon = array('/smile/','/confused/','/big_grin/','nyah'); 
        return str_replace($smiley, $icon, $capture[1]).$capture[2];
      } 
      
      // sample text
      $txt = '<div id=":)">This <---- text :) outside tags<a href="mailto:p..."> mail me :p </a>whatever <a text href="http://text.com">text</a> here</div> and final text :D';
      
      // separates what is inside and outside html tags
      $out = preg_replace_callback('#(?:(?<=>)|^)((?:(?!<[/a-z]).)*)([^>$]*>|$)#is', "mon_rplc", $txt);
      
      // display result 
      echo htmlentities($out);

      I took me a while to make that regex. It's certainly not perfect but it works. Hope this helps.

      • I stand corrected. I just tried my regex above (should have done it before really) and even with a ungreedy quantifier (.*?) it will not work if the url's are embedded in a text. As the dot will also match the space it will capture all the text preceding the first (com|edu|uk…..) met.

        How about this:
        B[/B]

        This should match any embedded url’s even if there is a new line character in the middle of it.

        Not sure about how it will work in JS regex flavor. Just try and let us know

        • Try this pattern

          \b.*\.(com|org|net|edu|info|us|gov)\b 

          But what about yahoo.fr or www.123.be ? The list in the alternation will be loooooong!

          Edit: If you don't want to list all the domain extension codes, you could try

          \b.*\.[a-z]{2,3}\b

          But it will catch anything ending with a group of 2 or 3 alpha character. So it will also catch explorer.exe for example.

          • Try this:

            $v = <<<TXT
            BUT I'm really confused on how I can make links safe. 
            How do I extract the THE_URL from like [ u r l ]http://www.your-link.com[ / u r l ]
            or like [ u r l ]http://www.your-other-link.com[ / u r l ]
            TXT
            ;
            
            $pattern = '´\[ u r l \]\s*((?:https?|ftp|file)://[-\w+&@#/%?=~|!:,.;]*[-\w+&@#/%=~_|])\s*\[ / u r l \]´';
            $replace = '<a href="\1">\1</a>';
            $v = preg_replace($pattern, $replace, $v);
            
            echo $v;

            Of course you will have to remove the spaces from the tag in the pattern.

            • I just read a post above that asked if strpos could work backwards. Well I don't think so but by using strrev() one could emulate it.

              function strpos_backwards($text, $pos) {
                // finds the position of the next space starting at the given $pos
                $pos = strpos( $text, ' ', $pos);
                // if $pos in the middle of last word, this will position $pos at the end of string
                if ($pos === FALSE) $pos = strlen($text);
                // substr chops the string behind new $pos and reverse the chopped string
                $reversed_text = strrev(substr($text, 0, $pos));
                // it's here that strpos work backwards i.e. forwards but on a reversed string
                // substring it until first occurrence of a space
                $out = substr($reversed_text, 0, strpos($reversed_text, ' '));
                // reverse back the found word and return it
                return strrev($out);
              }
              

              I benchmarked it against the above solutions and it's the fastest (2 times faster than the my preg_split thing and Laserlight's string function that are both equally fast)

              • Originally posted by laserlight
                Havent tested your new version yet, but wouldnt end($out) be better than array_pop($out) in this case?

                It is indeed beter as there is no need to shorten the $out array.

                I guess we all have our (bad) habits.

                Thanks.

                • Yes, there will always be 'special cases'.

                  Just for the sake of efficiency I tried to optimize my code above to make it run even faster.

                  function getWord4($text, $pos) {
                    $pos = strpos( $text, ' ', $pos);
                    if ($pos === FALSE) $pos = strlen($text);
                    $out = preg_split("#[^\w-']#", substr($text, 0, $pos), -1, PREG_SPLIT_NO_EMPTY);
                    return array_pop($out);
                  }

                  preg_split does the job faster and also, strpos() is more efficient than my while() loop - (which, btw, had a bug when $pos was in the last word! - the while loop did not stop)

                  We all had fun at this post but what does the original poster thinks of all this?

                  • There's a bug with your solution though, in that if the position is in the middle of the word, not the whole word is returned (due to the substr())

                    True. If that's what he needs, it is easy to correct it by adding just one line:

                    function getWord2($text, $pos) {
                      if ($pos > strlen($text)) { 
                        return 'Tekst is veel te kort!'; 
                      }
                      while ($text{$pos} != ' ') $pos++;        //  <--------   added line
                      preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
                      return array_pop($out[0]);
                    } 

                    As for the "I'd" problem, pcre syntax considers it as two words. Which is gramaticly correct I guess (forgive me if I'am wrong, english is not my mother language!).

                    If not, change the regex pattern with #\b[\w']+#.

                    Et voilà. Meer moet dat niet zijn!

                    • Just my two (euro) cent:

                      function getWord2($text, $pos) {
                        if ($pos > strlen($text)) { 
                          return 'Tekst is veel te kort!'; 
                        } 
                        preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
                        return array_pop($out[0]);
                      } 

                      Using the assertion \b (which is no character consuming) does the magic trick.

                      Edit This BBcode doesn't like the regex pattern. Hence the CODE tag !

                      • Preg_replace would certainly get the job done but string functions are less greedy! Try this:

                        $txt='http://mysite.com/dir/index.htm';
                        $len=strlen($txt);
                        if ($txt{$len-1}!='/') $txt.='/';
                        echo $txt;
                        • Good morning,

                          This should also work.

                          $string = '<link href="css/test.css" 
                          rel="stylesheet"><script language="JavaScript" 
                          src="js/test1.js"></script><script language="JavaScript" 
                          src="js/test2.js"></script>'; 
                          
                          $pattern='#((?:href|src)=[\'"])([^\'"]*)#i';
                          
                          $replc="$1/home/$2";
                          
                          echo htmlentities(preg_replace($pattern, $replc, $string));
                          • Another version of Uffi's post:

                            Of course, you will have to remove all space in the [tags] from the variable $txt

                            $txt='her your text to be parsed';
                            $in=array(
                            '#\[URL([^]]*)\]([^[]*)\[/URL\]#',
                            '#\[EMAIL="([^"]*")\s*\]([^[]*)\[/EMAIL\]#');
                            
                            $out=array(
                            '<a href$1 target = "_blank">$2</a>',
                            '<a href="mailTo:$1">$2</a>');
                            
                            echo preg_replace($in, $out, $txt);
                            • $data = '<meta name="keywords" content="php, mysql, php templates, 
                                       apache, php manual, server, pdf, database, flash, phpbuilder, 
                                       content management system, sql, script, oracle, string, xml, 
                                       regular expressions, php5, webalizer, php tutorials, code, 
                                       nusoap, classes, developers"><meta name="description" 
                                       content="some description about php and other stuff.">';
                              
                              // captures required values and variable names         
                              preg_match_all('#<meta name="([^"]*)"[^"]*"([^"]*)#', $data, $out); // stores values in variables $keywords and $description // or any other meta name found. foreach ($out[1] as $k=>$v) $$v=$out[2][$k];