I've been instructed to write a routine that strips certain words from a text string. I concocted this function but I'm having trouble dealing with word delimiters correctly because the words may occur at the beginning of the string (in which case i want NO delimiters to remain) or the word may occur between two other words (in which case I want to leave one delimiter.

I was wondering if this might be possible with only one preg_match statement or whether I'll have to get all fancy with preg_splits and whatnot.

function remove_words($str) {
	$remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE');

$patterns = array();
foreach($remove as $rm) {
	$patterns[] = '/(^|\s)' . preg_quote($rm) . '($|\s)/i';
}

return preg_replace($patterns, '', $str);
} // remove_words()

echo "'" . remove_words('I think erego I am') . "'<br>"; // needs a space!
echo "'" . remove_words('Heretofore unknown') . "'<br>"; // works great

    Is /\bYourWord\b\s?/ too simple? This is presuming you'd want the word to only be taken away if it's on its own, not if it were part of another word, and to gobble up only one space after.

    EDIT: You didn't say if the word could occur at the end...

      Would /\bYourWord\b\s?/ Match 'A sentence that ends with YourWord' ?

      I ended up writing this routine which seems to do the trick:

      function remove_words($str) {
      	$remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE');	
      	$words = preg_split('/\s+/', $str);
      	$new_words = array();
      	foreach($words as $key => $word) {
      		if (($word != '') && !in_array(strtoupper($word), $remove)) {
      			$new_words[] = $word;
      		}
      	}
      
      return implode(' ', $new_words);
      }
      

        Perhaps this?

        function remove_words($str) {
           $remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE');
           $str = str_ireplace($remove, '', $str);
           return $str;
        }
        

        Would this not do the trick? This is simular to a thread I started in Code-Critique forum:
        http://www.phpbuilder.com/board/showthread.php?p=10879661#post10879661

        It seems that if you can get away with not resorting to expressions, the code executes faster (in my experience, there is a difference.. but not a huge one.. but a difference none-the-less).

        Cheers,

        NRG

        Cheers,

        NRG

          You'd get double spaces "I think erego I am" would return "I think I am" (with two spaces)

            nrg_alpha; that type of code is what would easily end up with the "clbuttic" mistake. I personally would go with Drakla's suggestion of using the \b word boundary markers in a regular expression find/replace.

            EDIT: If it's the spaces that are a problem, you could try:

            $text = preg_replace('/\s?yourword\s?/i', '', $text);

              Hi Brad..

              Mabey I'm misunderstanding something.. going by the OP's goal:
              'I've been instructed to write a routine that strips certain words from a text string',

              would the str_ireplace routine not suffice?

              Cheers,

              NRG

                sneakyimp;10879815 wrote:

                You'd get double spaces "I think erego I am" would return "I think I am" (with two spaces)

                Ah yes, I see what you mean. My bad.

                Cheers,

                NRG

                  This pattern's ugly, but does it in one shot - just an extra option that will suck the spaces before and after if it's at the end of the line. Note YourWord is what you'd put in with the preg_quote bit, aye?

                  /(\bYourWord\b\s?|\s+\bYourWord\b\s?$)/i

                  I'd like a nicer pattern without the or, but can't muster it at the mo.

                    nrg_alpha;10879817 wrote:

                    Hi Brad..

                    Mabey I'm misunderstanding something.. going by the OP's goal:
                    'I've been instructed to write a routine that strips certain words from a text string',

                    would the str_ireplace routine not suffice?

                    Cheers,

                    NRG

                    The key word in the quote there is words.

                    The "clbuttic" mistake is that you decide to remove the vulgar word "ass" from a user's input, but instead of just stripping it out you decide to be humorous and replace it with the less offensive word "butt". So you simply do a str_ireplace() to replace "ass" with "butt".

                    Hence the word classic becomes "clbuttic." Search Google for this term, or any other word that has "ass" in it replaced with "butt" - it's a clbuttic, I mean classic, mistake.

                    (Thanks to Weedpacket in a previous thread for this example - it vaguely sounded familiar when he mentioned the word, but after a quick Google it all came back to me... very funny example of programming-gone-awry.)

                      bradgrafelman;10879822 wrote:

                      The key word in the quote there is words.

                      The "clbuttic" mistake is that you decide to remove the vulgar word "ass" from a user's input, but instead of just stripping it out you decide to be humorous and replace it with the less offensive word "butt". So you simply do a str_ireplace() to replace "ass" with "butt".

                      Hence the word classic becomes "clbuttic." Search Google for this term, or any other word that has "ass" in it replaced with "butt" - it's a clbuttic, I mean classic, mistake.

                      (Thanks to Weedpacket in a previous thread for this example - it vaguely sounded familiar when he mentioned the word, but after a quick Google it all came back to me... very funny example of programming-gone-awry.)

                      Ohh.. I thought you were referring to Sneakyimp's problem in THIS thread.. sorry..

                      Cheers,

                      NRG

                        Sneakyimp, my solution does not create two spaces when I tested it.. the '' that does the replacing does not create an extra space.. it is 'nothing'.. when I try the following code, it works:

                        function remove_words($str) {
                           $remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE'); 
                           $str = str_ireplace($remove, '', $str); 
                           return $str; 
                        }
                        
                        $test = 'I am SOMNAMBULIST therefore I am!';
                        echo $test . '<br />';
                        echo remove_words($test);
                        

                        Cheers,

                        NRG

                          nrg_alpha;10879823 wrote:

                          Ohh.. I thought you were referring to Sneakyimp's problem in THIS thread.. sorry..

                          I was/am - it's the same concept.

                            Brad, I don't think sneakyimp is trying to replace one word for another...if I understand correctly, he just wants to strip out specific words...

                            so this is not the same as in the link I initially posted (which yes, I agree with what you and Weedpacket are saying).

                            But in THIS case (this thread), I fail to see the usage of 'clbuttic' when the goal here is simply to remove words.. not replace them with other words.. Unless I am very seriously misunderstanding everything here..

                            Cheers,

                            NRG

                              nrg_alpha,

                              It's pretty simple.. you want to remove full words. Your solution will deform one word into another. For example:

                              remove_words('I think neweregoword I am');

                              Would yeild "I think newword I am" deforming my word "neweregoword". Hence clbuttic = classic

                                m@tt;10879829 wrote:

                                nrg_alpha,

                                It's pretty simple.. you want to remove full words. Your solution will deform one word into another.

                                In the test from the last code I posted, it deforms (replaces) 'one successfully found criteria' for another if thats what you mean.. in this case, due to an array of 'words' from the $remove array, this is replaced with ''.

                                Is this not in essence 'removing' a word? When I examine the string after it has been passed through the function, the word that is not supposed to be there isn't there.

                                m@tt;10879829 wrote:

                                For example:

                                remove_words('I think neweregoword I am');

                                Would yeild "I think newword I am" deforming my word "neweregoword". Hence clbuttic = classic

                                Perhaps I'm new to this clbuttic thing.. but doesn't the replacement word have to actually be a word instead of ''? Or do you mean that by doing any form of replcaement, the 'context' of the string is altered which can result in a clbuttic situation?

                                Cheers,

                                NRG

                                  The whole point we're trying to make in suggesting regular expressions over a simple str_ireplace() is that regular expressions can make the distinction between words and a string of characters within a word.

                                  For example, using str_ireplace(), try to remove the offensive word "ass" from this string: Calling someone an ass can be classified as quite offensive. -- and then ask yourself what "clified" means.

                                    Point taken. I now understant completely. Just by the example words the OP used, str_ireplace() worked perfectly. But givine your latest example, it does not.

                                    So I concede.. preg it is. (and now I realise fully the definition of clbuttic).

                                    Sometimes the simplest examples hammer home the point the hardest.

                                    Cheers,

                                    NRG

                                      $patterns[] = '/(^)?\s?\b'.preg_quote($rm).'\b(?(1)\s?)/i';

                                      Just for the fun of it. My previous pattern left spaces at the end of lines if the word occured at the end of a line, which you could get rid of by modifying the pattern to b[/b] but the one shown above doesn't have the repeat. The only site effect of this new pattern is that it essentially does a left trim.

                                        ok so let me see...

                                        function remove_words($str) {
                                            $remove = array('SOMNAMBULIST', 'EREGO', 'HERETOFORE');
                                        
                                        $patterns = array();
                                        foreach($remove as $rm) {
                                        	$quoted = preg_quote($rm, '/');
                                            $patterns[] = '/(\b' . $quoted . '\b\s?|\s?\b' . $quoted . '\b\s?(?=$|\n))/i';
                                        }
                                        
                                        return preg_replace($patterns, '', $str);
                                        } // remove_words()
                                        
                                        echo "'" . remove_words('I think erego I am') . "'\n"; // needs a space!
                                        echo "'" . remove_words('Heretofore unknown') . "'\n"; // works great 
                                        echo "'" . remove_words('I think heretofore erego i am') . "'\n"; // works great 
                                        echo "'" . remove_words('I was satisfied heretofore') . "'\n"; // works great 
                                        

                                        I believe I've covered all the boundary conditions in the examples and it seems to work:

                                        'I think I am'
                                        'unknown'
                                        'I think i am'
                                        'I was satisfied'
                                        

                                        I think you nailed it drakla. The expression itself is a bit scary to me (this is to be expected from the undead i suppose) so I think I'll stick with my previous function which I can grasp a little better.

                                        The scariest parts are all those question marks. I'm not really sure what they do.

                                        Thanks for the valiant effort guys!

                                          Write a Reply...