[RESOLVED] Unable to determine file type...

c_tcp_ip · Aug 1, 2011

Hi. I have some code that checks to see if a file is of type ".csv" funny thing is when I upload it tells me the file is fine and accepts it as .csv. however when another party uploads it keeps saying it is not.

If i take that same file and upload from my computer it accepts it. I'm trying to wonder if it is the way that different versions of excel encodes the file...but I'm a little bit stumped at the moment.

here is the code

//first if the extension isn't csv then kick it out
//note: the first paratemer of $files is the actual name of the file from upload.php which
if ( ($fileType=$_FILES["file"]["type"] !='text/csv') ){;
	echo("this is a file of type "). $fileType . ("<br>");
	echo("this is a file of type "). filetype($_FILES["file"]["type"]). ("<br>");
	echo("<font color='red'><h2>You can only upload .CSV files...</h2></font>");
	exit();
}

I've been thinking about this problem too much so my brain is fried in regards to it. any help would be appreciated.

Thanks

bradgrafelman · Aug 1, 2011

From the manual page [man]features.file-upload.post-method[/man]:

PHP Manual wrote:
$_FILES['userfile']['type'] - The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.

In other words, the user's browser might pass 'text/csv', or it might pass 'zomg look at me, the best browser in the world!', or it might not pass anything at all.

For a more robust solution, consider using [man]finfo_file/man (or the deprecated [man]mime_content_type/man, if you don't meet the requirements for the former).

EDIT: Also note that you really aren't doing anything at all to "determine the file type" other than trusting what the user claims it to be. In other words, I could just as easily upload "my_new_nasty_trojan.exe" and claim it to be a "text/csv" file type.

c_tcp_ip · Aug 2, 2011

Thanks for the response Brad. With that first code I wasn't nothing back for the file type so i changed it and added the following:

if ( ($fileType=$_FILES["file"]["type"] != "text/csv") ){;
	echo("this is a file of type "). $fileType . ("<br>");
	[color="blue"]echo("this is a file of type "). filetype($_FILES["file"]["type"]). ("<br>");[/color]
	echo("<font color='red'><h2>You can only upload .CSV files...</h2></font>");
	exit();
}

this seems to work as it gives me now its of type "appliction/vnd.msexcel". should I just allow all file types of this to pass and be uploaded and let the other error handling code catch this or should I be adament that it be a .csv file type?

As well I'm a little confused as to why the browser would report a different mime type if it is not explicitly renamed.

I will also look into the two functions you suggested.

EDIT: noticed its same code I posted originally. Just to say that filetype seems to give me a actual file type.

johanafm · Aug 2, 2011

RFC 1867

3.3 use of multipart/form-data

The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.) Each field of the form
is sent, in the order in which it occurs in the form, as a part of
the multipart stream. Each part identifies the INPUT name within the
original HTML form. Each part should be labelled with an appropriate
content-type if the media type is known (e.g., inferred from the file
extension or operating system typing information) or as application/octet-stream

http://www.php.net/manual/en/features.file-upload.post-method.php

$_FILES['userfile']['type']
The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.

c/tcp/ip;10985369 wrote:
As well I'm a little confused as to why the browser would report a different mime type if it is not explicitly renamed.

First off, what you call "browser" may not be a browser at all. It might just as well be a PHP script I wrote to upload 'nasty_stuff.exe' while supplying a content-type value of "fluffy bunnies". NEVER trust data from any outside source. You can make no assumptions that the outside is benevolent, and neither can you assume the outside is correct.

Secondly, a use may have named their CSV file 'values.txt', which doesn't means it's not a CSV file, just that it's extension is something else. Unless the browser analyzes the file contents, there is no way it can tell you that the file is "text/csv". It is however likely to claim the file to be "text/plain".

Thirdly, even if the browser wants to supply the best possible mime type values that it can, it may not be able to correctly do so for various reasons. For example, while IE might be able to correctly determine that a given file is an excel file, an earlier versin of IE might not, if excel has changed its file format since then. The programmers of a non IE-browser may not be able retrieve information on all the various formats that excel has ever had since 1985 (Mac) or 1987 (Windows).

Also, for browsers to be able to handle sniffing of all mime type formats, they'd be a lot heavier programs. Personally, I want a browser to browse the web and nothing else, just like I want an MP3-player to play MP3s and NOT browse the web (thank you Winamp 2.09a). So in the end, even if some/all browsers did this, I'd sure as hell make sure I had another browser to use, even if it meant developping it myself and you'd once again not be able to trust the 30GB "intelligent" browsers.

And in the end, even if all browsers correctly did all of this, making you extremely trusting, remember that I could still send you a file of type "fluffy bunnies that we all love so much"...

Finally, since it's impractical for browsers to properly do the job, I don't reallt see why they should at all. Sure, it's nice of them so tell you that .json is application/json, .txt is plain/text etc for common formats, but since you can't
1. trust this information
2. just as easily check the file extension yourself
3. are likely to know exactly what a format is if you want to handle it, which also means YOU are able to actually validate that the file is in this format

... I don't really see why they should supply the value to begin with.

c_tcp_ip · Aug 2, 2011

Thank johanafm. I did some research last night after posting and came to this same conclusion. I have to admit I'm not a big fan of this and this to me is one issue of the 'cloud' because there are so many different versions of same browsers, plus so many browsers itself. Either way, this still needs to be mitigated.

I don't know how much larger this would make the browser .exe but I'm sure there is a way to address this problem.

Saying all that. The problem still is there. How would you suggest I circumvent this?
1) accept whatever file they are uploading and if the rest of the code can't parse it then I know it's not a .csv file?

2)just check the file extension to make sure it's not .exe or check the file itself to make sure there is no "<?php" opening tags and not a php script

3)use the PECL fileinfo(). I was looking online and this seems to not be a bullet proof solution either.

I know there is obviously a solution as I'm able to upload csv file to other sites before. Whether they circumvented this with little security or other ways though I don't know.

Any suggestions would be greatly appreciated from you guys though.

bradgrafelman · Aug 2, 2011

Part of me wonders why you care this much about security in this instance?

If you can't parse the file as a CSV, then what's wrong with just failing out and displaying an error message to the user?

c_tcp_ip · Aug 2, 2011

bradgrafelman;10985402 wrote:
Part of me wonders why you care this much about security in this instance?

Because I'm trying to do it right the first time. Not trying to be rude brad as you are a great help but not thinking about security is what gets developers/sites/biz in trouble in the first place. lulzsec, anonymous...?

If you can't parse the file as a CSV, then what's wrong with just failing out and displaying an error message to the user?

This is what I was thinking would be best I just wanted to see if there were any other 'PHP' ways of going about this.

Thanks for you help. I'll report back if anything or close the thread.

bradgrafelman · Aug 2, 2011

c/tcp/ip wrote:
Because I'm trying to do it right the first time. Not trying to be rude brad as you are a great help but not thinking about security is what gets developers/sites/biz in trouble in the first place. lulzsec, anonymous...?

I never said anything about "not thinking about security," I said why are you so concerned with "not thinking about security in this instance."

What's the worst that could happen? Someone uploads my_nasty_trojan.exe, calls it a "text/csv", and then your script attempts to parse it as a CSV and fails? Unless you're storing the files for later use, who cares whether or not they uploaded a CSV or a trojan? If it's a CSV, awesome, the processing will proceed. If it's a binary trojan, awesome, the processing will fail.

Sure, you can use primitive techniques such as checking the file extension or somewhat less primitive techniques to try and cut down on the need to even attempt processing such as using the Fileinfo/mime_magic methods to sniff the file type based on certain characteristics of the file's contents, but in the end there's no need to "secure" anything other than to let the processing fail (again, unless you're storing these files for later use or something of that nature).

[RESOLVED] Unable to determine file type...

Cc_tcp_ip

Bbradgrafelman

Cc_tcp_ip

Jjohanafm

Cc_tcp_ip

Bbradgrafelman

Cc_tcp_ip

Bbradgrafelman