Hello all,

I am working on a project and have the following dilemma:
I have a CSV file generated by a Microsoft application that has some sort of “Microsoft-isums” what I mean by this, is that I cannot simply run a “Load data infile” to get the contents of this file into MySQL.
I can get php to read this file with the following code:

$filename = myfile.csv';
$handle = fopen($filename, "rb");
$contents = fread($handle, filesize($filename));
fclose($handle);
echo $contents;

This will print the contents of myfile.csv to the screen (without using the ‘b’ argument when defining my $handle variable all I get is a ton of binary text).
What I need to do it output the contents of this file into a plain ascii / text file so I can import it into MySQL, I have tried everything from iconv to ereg_replace with no luck..

Any advice is greatly appreciated, all I really need to do is remove all NON-VISIBLE characters from my string ($contents in the above code).

Thanks!

    Maybe you could do something like:

    $contents = preg_replace('/[\x00-\x08\x0B-\x1F]/', ' ', $contents);
    

    This would replace all special characters from hex 00 - hex 1F (except for tabs and newlines) with a space.

      NogDog wrote:

      Maybe you could do something like:

      $contents = preg_replace('/[\x00-\x08\x0B-\x1F]/', ' ', $contents);
      

      This would replace all special characters from hex 00 - hex 1F (except for tabs and newlines) with a space.

      This was a great help - thanks! However, now my output has no line returns so the import to mysql is only one line!

      Is there a way to keep my line returns and not put a space between every character in the file?

        Hmm...I probably miscalculated a hex value or something. How about something like:

        $contents = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', ' ', $contents);
        
          NogDog wrote:

          Hmm...I probably miscalculated a hex value or something. How about something like:

          $contents = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', ' ', $contents);
          

          This is working and the import into MySQL works great.

          Thank you for helping solve this – if you have a moment, can you tell me what you did? I don’t really understand the command

          $contents = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', ' ', $contents);

          Other than the obvious part about assigning the output of preg_replace to my variable.. What is all the other stuff ('/[\r\n\t\x20-\x7E\xA0-\xFF]/')?

          Thanks again!

            The [...] is a negated character class, meaning "match on any character not listed within the brackets. Within that class, the \r\n\t respresent carriage-return, newline, and tab characters. the rest represents two ranges of characters represented by their hex codes. So basically we're allowing carriage returns, newlines, and tabs plus anything from hexcode 20 through FF, excluding those from hexcodes 7F through 9F; anything else will be replaced by the second argument, which in this case is a space.

              NogDog wrote:

              The [...] is a negated character class, meaning "match on any character not listed within the brackets. Within that class, the \r\n\t respresent carriage-return, newline, and tab characters. the rest represents two ranges of characters represented by their hex codes. So basically we're allowing carriage returns, newlines, and tabs plus anything from hexcode 20 through FF, excluding those from hexcodes 7F through 9F; anything else will be replaced by the second argument, which in this case is a space.

              Perfectly explained – I have officially learned something :-)

              Thanks again NodDog

                Write a Reply...