dalecosp;11028321 wrote:As your consumer boxes handle 12K jobs/hour, I suppose the first thing you want to decide is how many hours you want to wait for things to be done?
To call these 'boxes' is not really accurate as they are but a virtual server running as a timeslice on actual hardware that is running a hypervisor -- at least I think so. Q.v. Rackspace Cloud Servers.
dalecosp;11028321 wrote:Your master node would divide the jobs into chunks of 12K records, or 24K records ... etc. It would keep a record of which ones were assigned. Your slave nodes would be responsible for reporting back to the master when they were done, and the master would check the job queue again and either hand out another chunk of work or give the slave instructions to go home for the day.
I have designed things a bit differently than this inasmuch as I have set up each of the Cloud Servers (i.e., slaves, consumers, ImageDaemons) to use a query that will check my job table for some eligible records. E.g.,
SELECT * FROM jobs_table WHERE record_lock_microtime IS NULL AND record_lock_name IS NULL and fetch_failures < 10 ORDER BY blah blah blah LIMIT 500
. This seemed advantageous because it prevents the master node from being a bottleneck when slaves need work, eliminates the need to serialize all kinds of data for transmission to a slave, and also allows each slave to talk directly to the db servers as jobs are either completed or they fail. Basically, once the slave is in motion, it talks directly to the database and the jobs table is the clearing-house for all work completed.
dalecosp;11028321 wrote:Is that helpful? Possible? I have to admit you're somewhat over my head on this ;-)
It's always helpful to have someone chime in as it stimulates thought. In explaining it, I understand it better.
The job works effectively at the moment, but I need to know better how to control the slaves from the central machine. I'm thinking each slave should be running a web server and the HTTPS protocol is how the master node will communicate with the slaves. This is a bit tricky as I have written my multi-threaded (multiprocessing?) PHP script that runs on each slave machine to launch and die by CLI (a cron job restarts it periodically). It seemed like a bad idea to try and combine the concurrent code type stuff with apache -- I still think this is true.