sneakyimp;10879815 wrote:

You'd get double spaces "I think erego I am" would return "I think I am" (with two spaces)

Ah yes, I see what you mean. My bad.

Cheers,

NRG

    This pattern's ugly, but does it in one shot - just an extra option that will suck the spaces before and after if it's at the end of the line. Note YourWord is what you'd put in with the preg_quote bit, aye?

    /(\bYourWord\b\s?|\s+\bYourWord\b\s?$)/i

    I'd like a nicer pattern without the or, but can't muster it at the mo.

      nrg_alpha;10879817 wrote:

      Hi Brad..

      Mabey I'm misunderstanding something.. going by the OP's goal:
      'I've been instructed to write a routine that strips certain words from a text string',

      would the str_ireplace routine not suffice?

      Cheers,

      NRG

      The key word in the quote there is words.

      The "clbuttic" mistake is that you decide to remove the vulgar word "ass" from a user's input, but instead of just stripping it out you decide to be humorous and replace it with the less offensive word "butt". So you simply do a str_ireplace() to replace "ass" with "butt".

      Hence the word classic becomes "clbuttic." Search Google for this term, or any other word that has "ass" in it replaced with "butt" - it's a clbuttic, I mean classic, mistake.

      (Thanks to Weedpacket in a previous thread for this example - it vaguely sounded familiar when he mentioned the word, but after a quick Google it all came back to me... very funny example of programming-gone-awry.)

        bradgrafelman;10879822 wrote:

        The key word in the quote there is words.

        The "clbuttic" mistake is that you decide to remove the vulgar word "ass" from a user's input, but instead of just stripping it out you decide to be humorous and replace it with the less offensive word "butt". So you simply do a str_ireplace() to replace "ass" with "butt".

        Hence the word classic becomes "clbuttic." Search Google for this term, or any other word that has "ass" in it replaced with "butt" - it's a clbuttic, I mean classic, mistake.

        (Thanks to Weedpacket in a previous thread for this example - it vaguely sounded familiar when he mentioned the word, but after a quick Google it all came back to me... very funny example of programming-gone-awry.)

        Ohh.. I thought you were referring to Sneakyimp's problem in THIS thread.. sorry..

        Cheers,

        NRG

          Sneakyimp, my solution does not create two spaces when I tested it.. the '' that does the replacing does not create an extra space.. it is 'nothing'.. when I try the following code, it works:

          function remove_words($str) {
             $remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE'); 
             $str = str_ireplace($remove, '', $str); 
             return $str; 
          }
          
          $test = 'I am SOMNAMBULIST therefore I am!';
          echo $test . '<br />';
          echo remove_words($test);
          

          Cheers,

          NRG

            nrg_alpha;10879823 wrote:

            Ohh.. I thought you were referring to Sneakyimp's problem in THIS thread.. sorry..

            I was/am - it's the same concept.

              Brad, I don't think sneakyimp is trying to replace one word for another...if I understand correctly, he just wants to strip out specific words...

              so this is not the same as in the link I initially posted (which yes, I agree with what you and Weedpacket are saying).

              But in THIS case (this thread), I fail to see the usage of 'clbuttic' when the goal here is simply to remove words.. not replace them with other words.. Unless I am very seriously misunderstanding everything here..

              Cheers,

              NRG

                nrg_alpha,

                It's pretty simple.. you want to remove full words. Your solution will deform one word into another. For example:

                remove_words('I think neweregoword I am');

                Would yeild "I think newword I am" deforming my word "neweregoword". Hence clbuttic = classic

                  m@tt;10879829 wrote:

                  nrg_alpha,

                  It's pretty simple.. you want to remove full words. Your solution will deform one word into another.

                  In the test from the last code I posted, it deforms (replaces) 'one successfully found criteria' for another if thats what you mean.. in this case, due to an array of 'words' from the $remove array, this is replaced with ''.

                  Is this not in essence 'removing' a word? When I examine the string after it has been passed through the function, the word that is not supposed to be there isn't there.

                  m@tt;10879829 wrote:

                  For example:

                  remove_words('I think neweregoword I am');

                  Would yeild "I think newword I am" deforming my word "neweregoword". Hence clbuttic = classic

                  Perhaps I'm new to this clbuttic thing.. but doesn't the replacement word have to actually be a word instead of ''? Or do you mean that by doing any form of replcaement, the 'context' of the string is altered which can result in a clbuttic situation?

                  Cheers,

                  NRG

                    The whole point we're trying to make in suggesting regular expressions over a simple str_ireplace() is that regular expressions can make the distinction between words and a string of characters within a word.

                    For example, using str_ireplace(), try to remove the offensive word "ass" from this string: Calling someone an ass can be classified as quite offensive. -- and then ask yourself what "clified" means.

                      Point taken. I now understant completely. Just by the example words the OP used, str_ireplace() worked perfectly. But givine your latest example, it does not.

                      So I concede.. preg it is. (and now I realise fully the definition of clbuttic).

                      Sometimes the simplest examples hammer home the point the hardest.

                      Cheers,

                      NRG

                        $patterns[] = '/(^)?\s?\b'.preg_quote($rm).'\b(?(1)\s?)/i';

                        Just for the fun of it. My previous pattern left spaces at the end of lines if the word occured at the end of a line, which you could get rid of by modifying the pattern to b[/b] but the one shown above doesn't have the repeat. The only site effect of this new pattern is that it essentially does a left trim.

                          ok so let me see...

                          function remove_words($str) {
                              $remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE');
                          
                          $patterns = array();
                          foreach($remove as $rm) {
                          	$quoted = preg_quote($rm, '/');
                              $patterns[] = '/(\b' . $quoted . '\b\s?|\s?\b' . $quoted . '\b\s?(?=$|\n))/i';
                          }
                          
                          return preg_replace($patterns, '', $str);
                          } // remove_words()
                          
                          echo "'" . remove_words('I think erego I am') . "'\n"; // needs a space!
                          echo "'" . remove_words('Heretofore unknown') . "'\n"; // works great 
                          echo "'" . remove_words('I think heretofore erego i am') . "'\n"; // works great 
                          echo "'" . remove_words('I was satisfied heretofore') . "'\n"; // works great 
                          

                          I believe I've covered all the boundary conditions in the examples and it seems to work:

                          'I think I am'
                          'unknown'
                          'I think i am'
                          'I was satisfied'
                          

                          I think you nailed it drakla. The expression itself is a bit scary to me (this is to be expected from the undead i suppose) so I think I'll stick with my previous function which I can grasp a little better.

                          The scariest parts are all those question marks. I'm not really sure what they do.

                          Thanks for the valiant effort guys!

                            I was thinking of putting an explanation of the pattern in for this one

                            b?\s?\bYourWord\b(?(1)\s?)[/b]

                            The basics of it are that you test if you're at the start of a line, and to do that you use the ^ and put it into brackets, which for those who've done a bit of regex with know also creates a capturing pattern with 1 as its id, but it must be optional, and that's what the question mark does.

                            B?[/B] check if we're at the start of the line [the ], do a capture so we can check it later [the brackets], but make it optional [the question mark]

                            \s?\bYourWord\b is grab a space before the word if it's there [\s?], and the word itself

                            The last bit tests whether the optional start of line pattern actually did capture anything, and if so also says take whitespace from after the word

                            B[/B] means did 1 capture anything? If so optionally scoop up another space.

                            So the logic of the whole expression is always grab a space from in front of the word, but if you're at the start of a line then also grab a space after.

                            That's going to be unintelligible drivel, isn't it. This will probably help more:
                            http://www.regular-expressions.info/conditional.html

                              Write a Reply...