Originally posted by Nate
1) Is it true if I have a single directory with 10,000+ (what about 100,000) files that it gets hard/time consuming for the filesystem to find a single file in that directory?
Depends on the filesystem. Windows FAT would bite, I think Windows FAT32 would also bite, and I'm not sure about Windows NTFS. There are also a variety of filesystems available for Unixes as well. They have different characteristics, but most modern filesystems maintain btree indexes for their directories, which is the same data structure used in DBMSs for their indexes (ReiserFS is supposed to be pretty good - I haven't tried it, but understand that it uses a different data structure and is much more disk-space efficient when it comes to storing lots of tiny files); in effect filesystems are databases that have been specially optimised to the task of storing files. (Windows Longhorn was originally going to use SQL Server as its filesystem, but Microsoft couldn't get the idea to work.)
To be honest, it's one of those cases where Your Mileage May Vary, and there may well be a threshold (or more than one) where the performance of one wins out over the other. (Needless to say, since the database also has to be stored in the filesystem at some point, it gets affected by the choice of filesystem as well; except this time it's one big file and one not-so-big file, instead of a lot of smaller ones.)
That's probably why you've heard conflicting stories. The thing to keep in mind is that by storing the images as files you can use all the resources at your disposal for working with files to handle them, but to work with BLOBs you have to get them out of the database first.
2) Are you sure it would be best to use getimagesize() (and i think id have to use mime_content_type(), though not sure) to get all the information i need about an image each time i use it rather than storing it one time when its uploaded?
(To determine the dimensions of the image, getimagesize() first has to determine the type of image; since it doesn't hurt to do so, it returns that as well). You can try the two, but I suspect that any speed difference either way would be swamped by other effects on round trip performance - other apps running on your server, the clock speed of the client computer, a misconfigured proxy server at the user's ISP, bad aircon in a Kyoto server farm ....
I see one big advantage of storing metadata in the db; you can search and sort on it. If the content is going to be relentlessly cumulative, and assuming the upload method is sufficiently robust, I don't see too much opportunity for mistakes to creep in.