Best way to store images?

Nate

I'm trying to plan out the best way to develop an image hosting script for my client.

I'd like to use the least CPU intensive way assuming 10's of thousands of images in this system. Disk space is less important, as that seems to be a cheaper upgrade than memory/CPU would be.

There are basically 2 scenarios I've come up with but I'm open to other
ideas:

1) Store images on the file system - and the image information (file
name, timestamp, size, dimensions, type, etc) in MySQL.

2) Store images in MySQL table 1 and their corresponding information
in table 2.

Also for each image request I'd like to record bandwidth usage and probably other statistics in MySQL so assume at least this one query on MySQL for each image request (if not 2 queries if I store the image itself in MySQL).

Lastly, I will use mod_rewrite so clients can access images by their name rather than a php page.

So I guess my questions are which scenario do you think is best and because mod_rewrite uses a little more overhead would that be a major issue assuming a busy site?

Thanks for your help!

Nate

Weedpacket

Store them on the filesystem. Storing them in the database obviously adds additional layers of encoding and translation, and it means you're using your database for a filesystem instead of just using the filesystem. Most of the properties of the image can be obtained from the image directly (with getimagesize()) and don't need to be stored (again) elsewhere, where they might manage to go out of sync.

It's more important that the program be simple and comprehensible as possible. If performance is that critical, remember that in eighteen months' time new computers will be twice as powerful as new computers are now.

Nate

Thanks weedpacket, that all makes sense.

Just 2 things...

1) I have heard conflicting story's about file system vs. db. Is it true if I have a single directory with 10,000+ (what about 100,000) files that it gets hard/time consuming for the filesystem to find a single file in that directory? Would it not be faster to use mysql with an index?

2) Are you sure it would be best to use getimagesize() (and i think id have to use mime_content_type(), though not sure) to get all the information i need about an image each time i use it rather than storing it one time when its uploaded? What if i'm going through a big loop of images? I understand your concern that info could get out of sync but unless the image physically changes i don't [think] that would be an issue. You would probably know better than me though.

I definitely agree with keeping it simple and robust. I tend to get off on a tangeant in programming sometimes and end up making it overly complex.

Weedpacket

Originally posted by Nate
1) Is it true if I have a single directory with 10,000+ (what about 100,000) files that it gets hard/time consuming for the filesystem to find a single file in that directory?

Depends on the filesystem. Windows FAT would bite, I think Windows FAT32 would also bite, and I'm not sure about Windows NTFS. There are also a variety of filesystems available for Unixes as well. They have different characteristics, but most modern filesystems maintain btree indexes for their directories, which is the same data structure used in DBMSs for their indexes (ReiserFS is supposed to be pretty good - I haven't tried it, but understand that it uses a different data structure and is much more disk-space efficient when it comes to storing lots of tiny files); in effect filesystems are databases that have been specially optimised to the task of storing files. (Windows Longhorn was originally going to use SQL Server as its filesystem, but Microsoft couldn't get the idea to work.)

To be honest, it's one of those cases where Your Mileage May Vary, and there may well be a threshold (or more than one) where the performance of one wins out over the other. (Needless to say, since the database also has to be stored in the filesystem at some point, it gets affected by the choice of filesystem as well; except this time it's one big file and one not-so-big file, instead of a lot of smaller ones.)

That's probably why you've heard conflicting stories. The thing to keep in mind is that by storing the images as files you can use all the resources at your disposal for working with files to handle them, but to work with BLOBs you have to get them out of the database first.

2) Are you sure it would be best to use getimagesize() (and i think id have to use mime_content_type(), though not sure) to get all the information i need about an image each time i use it rather than storing it one time when its uploaded?

(To determine the dimensions of the image, getimagesize() first has to determine the type of image; since it doesn't hurt to do so, it returns that as well). You can try the two, but I suspect that any speed difference either way would be swamped by other effects on round trip performance - other apps running on your server, the clock speed of the client computer, a misconfigured proxy server at the user's ISP, bad aircon in a Kyoto server farm ....

I see one big advantage of storing metadata in the db; you can search and sort on it. If the content is going to be relentlessly cumulative, and assuming the upload method is sufficiently robust, I don't see too much opportunity for mistakes to creep in.