Hi. I am parsing a file, and I am getting some odd characters. I would like to just use a ereg_replace or preg_replace to get rid of theme, but I do not know how to match the characters (other than one at a time).

Here is an example of some of the characters:
ª å ¼  ä š ç ¤ ¾

Is there a query that I can use to find any characters that are NOT in the usual range of characters, symbols and numbers?

Thanks!!!

    Well, I found 1 solution, but I would think there is a better one out there?

    I used:

    ereg_replace("[[:punct:][:alnum:][:space:]]","",$string)

      First, google on "ascii table", something like this: http://www.ascii.cl/htmlcodes.htm
      It's easier to do this in hex than decimal.

      ASCII \x00 - \x1F, that is 0-31, are all control characters and whitespace. Includes tab and line endings. Note, \x00 and \x1A, that is, 0 and 26, are especially dangerous because they are end-of-file or end-of-string markers in various file types.

      ASCII \x20, that is, 32, is space.

      ASCII \x21 - \x7E, that is, 33-126, are characters and symbols on your U.S. keyboard.

      ASCII \x7F, that is 127, is a control character.

      Codes \x80 - \xFF, that is, 128-255, are so-called "high ASCII".
      Not really ASCII, but usually accented characters. Sometimes they are graphical symbols. Depends on your operating system or web page encoding. Your typical Windows box uses two different sets, one at the command prompt and one elsewhere! You might not want to delete them, since every language except English uses them. We mußt coördinate our efforts like web financés, using an encyclopædic effort, for even in ye olde English, þhere may be a use for some €. Search Wikipedia on "character sets".

      // Convert all control characters and whitespace and high-ASCII to spaces:
      $x= preg_replace('/[\x00-\x1F\x7F-\xFF]/', ' ', $x);
      
      // It's safer to specify allowed characters than banned characters, they say:
      $x= preg_replace('/[^\x20-\x7E]/', ' ', $x);
      
      // Some more examples.
      
      // Remove all control characters and high-ASCII, but allow line endings and tabs:
      $x= preg_replace('/[^\x09\x0A\x0D\x20-\x7E]/', '', $x);
      
      // Remove all control characters, but allow line endings and tabs and high ASCII:
      $x= preg_replace('/[^\x09\x0A\x0D\x20-\x7E\x80-\xFF]/', '', $x);
      // same as
      $x= preg_replace('/[^\t\n\r\x20-\x7E\x80-\xFF]/', '', $x);
      
      // Standardize line endings, remove blank lines, collapse all other whitespace to a single space, convert backticks to single quotes, and allow high-ASCII:
      $x= trim($x);
      $x= preg_replace('/[\r\n]+/', "\n", $x);
      $x= strtr("\t`", " '", $x);
      $x= preg_replace('/[^\n\x20-\x7E\x80-\xFF]/', '', $x);
      $x= preg_replace('/  +/', ' ', $x);
      
        Write a Reply...