I've written a multi-threaded PHP script that fetches tens of thousands of images daily and creates resized thumbnails of them. It works pretty reliably, but the amount of data and images it must handle is rapidly increasing and it will soon be necessary to use multiple [virtual] machines to handle the amount of data and images to be fetched. I took care in the original script to lock records that represent the work it will perform so that other threads/servers do not try attempt to peform the same work. Now that I am considering multiple machines, I need some way to identify which machine has a lock on which records so that we can know which machine is working on which records.

How can a virtual machine uniquely identify itself? Because each new machine will be fired up in the Rackspace Cloud from an identical machine image, I don't think there will be any way for the machines to distinguish themselves from each other based on their file system's contents. For instance, I think even the hostname file has the same contents so both machines would return the same result from this linux command:

hostname -f

Can anyone recommend a technique whereby a PHP script may uniquely identify the server on which it runs? I expect I could generate a unique ID for each machine and store it somewhere, but I'm wondering if there might be some better way to identify the machine based on its public [i.e., not-internally-stored] hostname. Note the following critieria:
when I look in the database and see that a particular machine has locked some records, i should be able to use the name in the record's lock field to locate the machine that originally locked it. For this reason, a public hostname sounds useful.
Because the image-fetching process restarts once each day, the ID cannot be a per-process or per-thread ID.
* Ideally the unique identifier would not be random but would be informative and would be persistent for the lifetime of the virtual machine.

Any help would be much appreciated.

    I'm not 100% but wouldn't you have a unique IP per VM? In which case you could use one of the following to get the information you need:

    $a = $_SERVER['HTTP_HOST'];
    $b = $_SERVER['SERVER_NAME'] = exec('hostname -f');
    $c = $_SERVER['SERVER_ADDR'];
    $d = apache_getenv("SERVER_ADDR");
    $e = php_uname('a');  //not sure what this will get you on VM
    

    Just a thought...

      I think it's also possible (required?) to attach a UUID to each cloud instance. You might check with whoever is handling the provisioning?

        Thanks for the input, guys.

        A couple of things to note:
        the script runs from CLI, not HTTP.
        While the cron job that starts these virtual machines may receive UUID at startup, the instance itself may not? I.e., when an instance starts up, it has no idea what UUID may have been returned by the API to the script that was responsible for starting the instance.

        I'm going to do a bit of testing/research with the rackspace API. It may be necessary for the control script to take an extra step to inform each new instance of its UUID or public IP address.

          I read someplace on this that you might get a serial number from dmidecode(8) on Linux ... if that applies.

            dalecosp;11017445 wrote:

            I read someplace on this that you might get a serial number from dmidecode(8) on Linux ... if that applies.

            Ideally, this identifier would be a hostname or IP address. The basic idea is that I spawn these machines who somehow know what their IP/FQDN is, they contact my central server and lock some records by entering this identifier in a particular column for some database records. When I visit that central server later and check the DB, if there are problems, I can tell which server is having the trouble and, based on its ID, I can open up an SSH window and login to it. If we use some kind of serial number, then I would need to login to each slave machine and check the local serial num until I locate the culprit.

            From what I can tell, functions like the CLI "hostname -f" is going to return an internal hostname on the Amazon LAN rather than a publicly accessible hostname.

            Still mulling this over.

              sneakyimp;11017787 wrote:

              Ideally, this identifier would be a hostname or IP address. The basic idea is that I spawn these machines who somehow know what their IP/FQDN is, they contact my central server and lock some records by entering this identifier in a particular column for some database records. When I visit that central server later and check the DB, if there are problems, I can tell which server is having the trouble and, based on its ID, I can open up an SSH window and login to it. If we use some kind of serial number, then I would need to login to each slave machine and check the local serial num until I locate the culprit.

              From what I can tell, functions like the CLI "hostname -f" is going to return an internal hostname on the Amazon LAN rather than a publicly accessible hostname.

              Still mulling this over.

              So the problem is in "how do they know their IP/FQDN", or otherwise? If the script could read both a UUID and an IP, couldn't you put all that info in the DB and then have a quick reference on how/which one to access? There must be something I'm still thick-headed on, here. (Not surprising, knowing myself as I do 😉 )

                sneakyimp;11017787 wrote:

                From what I can tell, functions like the CLI "hostname -f" is going to return an internal hostname on the Amazon LAN rather than a publicly accessible hostname.

                And, is the any chance they've already solved this problem for you? For example, in many DNS PTR record schemes, the big guys usually have the IP built into the hostname:

                host 64.126.192.78
                78.192.126.64.in-addr.arpa domain name pointer s64-126-192-78.nyc.sta.foobar.com.
                  6 months later

                  Revisited this issue today. Spoke with Rackspace customer support and was informed that there's no way for a given Cloud Server (i.e., one of rackspace's virtual servers) to determine its own public IP address through any native linux command or php function. It might be possible to set up the Rackspace API on this cloud server (with API credentials and everything) then get a listing of all allocated cloud servers for the account and perhaps correlate some information in there with locally available information to determine a server's own public IP address, but it all sounds very convoluted to me.

                  The linux support guy suggested this command which involves connecting to a 3rd party server which reports one's public IP:

                  curl http://icanhazip.com

                  I did manage to concoct this script in PHP which will spit out the LAN ip bound to eth1:

                  <?php
                  
                  $cmd = "ifconfig eth1 |grep \"inet addr\" |awk '{print $2}' |awk -F: '{print $2}'";
                  
                  $output = NULL;
                  $return_val = NULL;
                  $last_line = exec($cmd, $output, $return_val);
                  
                  echo "return val:" . $return_val . "\n";
                  
                  print_r($output);
                  echo "\n";
                  
                  echo "Your IP address is " . trim($output[0]) . "\n";
                  
                  ?>
                  
                    dalecosp;11018277 wrote:

                    And, is the any chance they've already solved this problem for you? For example, in many DNS PTR record schemes, the big guys usually have the IP built into the hostname:

                    host 64.126.192.78
                    78.192.126.64.in-addr.arpa domain name pointer s64-126-192-78.nyc.sta.foobar.com.

                    Not sure what you are suggesting, dalecosp. The hostname doesn't have any ip stuff in it. calling the host command on its eth1 IP results in 'not found':

                    [root@my-host-name ImageDaemon]# host 10.177.16.198
                    Host 198.16.177.10.in-addr.arpa. not found: 3(NXDOMAIN)
                      Write a Reply...