I've written a multi-threaded PHP script that fetches tens of thousands of images daily and creates resized thumbnails of them. It works pretty reliably, but the amount of data and images it must handle is rapidly increasing and it will soon be necessary to use multiple [virtual] machines to handle the amount of data and images to be fetched. I took care in the original script to lock records that represent the work it will perform so that other threads/servers do not try attempt to peform the same work. Now that I am considering multiple machines, I need some way to identify which machine has a lock on which records so that we can know which machine is working on which records.
How can a virtual machine uniquely identify itself? Because each new machine will be fired up in the Rackspace Cloud from an identical machine image, I don't think there will be any way for the machines to distinguish themselves from each other based on their file system's contents. For instance, I think even the hostname file has the same contents so both machines would return the same result from this linux command:
hostname -f
Can anyone recommend a technique whereby a PHP script may uniquely identify the server on which it runs? I expect I could generate a unique ID for each machine and store it somewhere, but I'm wondering if there might be some better way to identify the machine based on its public [i.e., not-internally-stored] hostname. Note the following critieria:
when I look in the database and see that a particular machine has locked some records, i should be able to use the name in the record's lock field to locate the machine that originally locked it. For this reason, a public hostname sounds useful.
Because the image-fetching process restarts once each day, the ID cannot be a per-process or per-thread ID.
* Ideally the unique identifier would not be random but would be informative and would be persistent for the lifetime of the virtual machine.
Any help would be much appreciated.