Hi,

I have spent hours (literally) tring to figure out the regular expression to clean the name of an uploaded file.

I find that my clients often try to upload non-images, so I can check if its a '.jpg', '.gif' etc, but what about if they name their files 'cupboard.html.jpg'?? (its happened!)

This is what I have so far (after checking the file):

	$photo = 'cupboard and other illegal chars' here!@#$%^&*(.html.jpg';

if (preg_match("/jpg|jpeg/i",$photo)){$image_type=".jpg";}
if (preg_match("/png/i",$photo)){$image_type=".png";}
if (preg_match("/gif/i",$photo)){$image_type=".gif";}

// Don't process non-images
if (eregi('.jpg|.jpeg|.gif|.png',$photo)) {	

	// Take out illegal char in name
	list($name)=explode($image_type, $photo);
	$name = $this->cleanString($name);		
	$photo_name = $name.$image_type;

	// Rename file
	rename($dir.$photo, $dir.$photo_name);

	// Thumb photo using GD
	createdetail($photo_name, $dir.$photo_name, $thumb_w, $thumb_h);
}

// Clean string to only Alpha Numeric char
function cleanString($content) {
	return ereg_replace("[^[:alnum:]+]","", ereg_replace(" ","_", strtolower($content)));
}

This takes out all spaces and most problematic characters (I think), but what I want is to show spaced in the filename as underscores (_) as well...

Any ideas?

Thanks.

    In the upload hash $_FILES is an element named 'mime_type'. This tells you something about the contents of the file. You could use this to reject certain files.

      Well, you can't really trust the MIME type; that can be just as bogus as anything else sent by the client (using [man]strrchr/man could have helped to get the file extension though, by grabbing everything starting at the last '.' in the filename). The most reliable way of identifying an image file and its type would be the information returned by [man]getimagesize/man.

        Thanks Guys, but I already know the file type using the line:

         list($name)=explode($image_type, $photo); 

        What I need is to clean up the uploaded file's name, taking out any weird characters such as '!@$# etc, and replacing spaced in the new file's name with underscores...

        thanks,

          How about replacing any character that isn't: alphanumeric, underscore, dash, period, or space?

          $file_name = preg_replace('/[^a-z0-9_\.\-[:space:]]/i', '_', $file_name);

            Or generate a new name of your own devising? A number of gallery engines use an MD5 hash of the image file; not only is the resulting name in a reliable format, it's pretty much guaranteed to be different from any other pre-existing file. After all, what would happen if someone tried to upload both "Image%!.png" and "Image!%.png"?

            And regarding "the file type using the line". Not really. That's just getting the extension from the end of the name. You're only assuming that it accurately reflects the file type. Since you've apparently got all sorts of weird strings passing themselves off as filenames, they hardly seem to be reliable witnesses.

              Thank you BradGrafelman, I think that may just work! Will let you know ASAP.

                Ok, just a note to let you know that worked very well Brad, with a little modification to get the spaces as underscores:

                PHP]$name=preg_replace('[^a-z0-9-_.]i','',strtolower(ereg_replace(" ","_",$name)));[/code]

                Thanks![

                  Write a Reply...