Hi all,

I've got a textarea input that I'd like to search for certain words, and strip them out or replace them with ''
What I've got a the moment is giving this error:

Warning: preg_replace() [function.preg-replace]: Delimiter must not be alphanumeric or backslash

From this code

$wordsArray = array("http","www",".com",".net","dot com","dot net");
    $arr = explode(" ", strtolower($textArea));
    $arrCntr = 0;
    foreach ($arr as $info) {
        if (in_array($info,$wordsArray)) {
            $textArea = preg_replace($wordsArray[$arrCntr], '', $textArea);
        }
        $arrCntr++;
    }

Note: I'd like to do more than simply strip out any html tags and leave behind a web url or a partial url.

Basically, if someone enteres www.a.com or www dot a dot com I'd like to strip out the "www" and the ".com" or "dot com" and not affect a word like "polka dot complete" (not sure why that would be entered, but you get the point 🙂 ). So it needs to be for the specific whole word "dot com" only, in that case.

I'm not too worried about performance as I don't anticipate this code being executed more than 1,000 times a day and the textarea is limited to 250 characters so the input string isn't too big.

Thanks.

    Since you're not using regular expressions, consider using [man]str_replace/man (or [man]str_ireplace/man) instead.

      Getting closer, thanks for the suggestion!

      Now I have:

          $wordsArray = array('http','www','.com','.net','dot com','dot net');
          $textArea = str_replace($wordsArray,'',$textArea);
      

      which does replace those items. however, there are two things that are happening:
      1. if "dot com" is typed in over two lines...meaning, if I type in dot near the end of the line and press enter and then type com on the next line...this doesn't get replaced. I'm guessing because there is a newline character in between them. how to take that into account?
      2. if i type in "dot common", the "dot com" is replaced with '', leaving com only. in these cases, I'd like to find the whole string "dot com" only and not those letters in combination with others. I tried changing the array to use ' dot com ' (with spaces before and after) but that doesn't work either.

      Ideas?

        sorry, I mistyped:

        2. if i type in "dot common", the "dot com" is replaced with '', leaving mon only.

          if I replace the single quotes '' with double quotes "" in the array words now " dot com " only removes it if its dot com and not dot common. That fixes issues #2 above.

          Still trying to figure out how to take a newline into consideration for issue #1.

            Actually, regular expressions would be a bit more appropriate here, since you can insist that a match only occurs at the start or end of a word (or not) using word boundary escape sequence.
            And for your first problem, see PCRE Delimiters.

            Note that [font=monospace].[/font] has a special meaning in PCRE, so if you want to match it then it will need to be escaped also.

            And if you want any whitespace to be matched - newlines as well as spaces (and tabs also for that matter) - then that's another escape sequence.

              Thanks Weedpacket. I actually tried that first (using preg_match and if found preg_replace) but couldn't get it to work. This str_replace gets me most of the way there. Not ideal, but it'll do for now.

                What did you try? Sounds like what you're wanting to do is surround the phrases you've got above with PCRE's \b ("word boundary").

                Also note that there's reason to first do a match followed by a replace - just do the replace; if it doesn't find anything, then it won't replace anything.

                  Oh, and another thing to note: both [man]preg_replace[/man] and [man]str_replace[/man] can search/replace a whole array of strings at once - no need for a loop to do them one at a time.

                    Write a Reply...