I'm working on a cloud-computing-type situation where I would like to bring multiple servers online in response to a fluctuating workload. I'm using Cloud Servers at rackspace.com and have saved a machine image representing one functional server. I'd like to bring online new instances of this server when there is a lot of work to be done and take them offline when there's nothing to be done.
I also want to have each server periodically check in with a central server to say "I'm still running". As each of these Cloud Servers will spawn with some PHP code that contains a username/password that permits access to the central server, this checkin can easily be accomplished by the Cloud Server inserting or updating a record in the centralized db with a timestamp that gets stored in a particular table (call it "helper_checkin" for now). I will then have a cron job on the central server that checks these timestamps
The problem I have is that if there are multiple servers checking in, I need a way to distinguish them from each other in such a way that the cron job running on the central server can grab all the records in the helper_checkin table and check the timestamp of each. If any particular timestamp is more than 15 minutes old, it's probably safe to say that the machine is no longer running and I would like to send a notification via email to tell me the Cloud Server has crashed. Obviously, this notification should clearly indicate to a sysadmin which machine has halted so that s/he can login and see what the problem is.
I'm leaning toward the hostname of each server as returned by [man]gethostname[/man] function. The problem with this is that these hostnames are not FQDN hostnames, but rather alphanumeric strings with some hyphens maybe. I'm not even sure they will be unique. I want to avoid having to put some unique string in the PHP code or file system of each server that gets spawned because this just seems like an extra step. The ideal situation would be that I could have some PHP function which returns the public-facing IP address of each Cloud Server (e.g., 50.57..) and not the IP address of the Cloud Server on the internal network (e.g., 10.181..). That would make it extremely easy to just login to the server in question and seek out the source of the problem. I've been looking around but none of these functions seem quite right.
Any thoughts would be much appreciated.