ian001;11037229 wrote:
is the correct standard way to build large websites?
Large will probably mean different things to different people. But looking at the 5k users per day, I'd say that
1. You should definitely need no more than one server
2. The purpose of using two servers would be to avoid a single point of failure
With a two server setup, I'd keep all (logged in or not) users on both servers. And regardless of that, if the two servers serve the same pages, those pages should definitely use the same url mapping. First off because not doing things the same way on both servers is highly likely to cause all sorts of errors and problems. Secondly, having multiple urls point to the same resource will lead to worse search ratings (or so I have heard, but not verified or read as first hand information).
ian001;11037229 wrote:
If you update the site through the admin, one site is updated instantly, however the other site takes a hour and half to update, because the databases need to sync.
Introducing more servers adds complexity which needs to be handled. However, both web servers should be using the same database(s) for the same queries. You may of course have both multiple database servers and multiple web servers, but your web servers should be using those database servers in the same way. For example, if you have one db master and one db slave, you would be running updates and inserts on the master and selects on either master or slave. The push rate from master to slave(s) could probably be specified, but I have never done this myself, so I do not know for certain. However, while the slave may of course be lagging behind the master slightly, this should (or at least could) be no more than seconds, rather than hours.
Another reason which is more likely for these slow updates on the other server, is if your web servers cache full and/or partial pages and/or caches data locally while not invalidating cache on the other server.
Example:
If A does not exist, it is cached for 4 hours as a static html file and thus resides on disk, which is local to each web server. If the file is older than 4 hours, the server will recreate it with fresh data from the database.
At this point in time, both server #1 and #2 have it cached.
Some admin connected to #1 posts new data. #1 recieves data, stores it in the database and invalidates its cache - probably by simply deleting the static html file for A.
#2 receives no notifcation about updates to A and does nothing to its file. For up to 4 hours, #2 will keep serving stale data.
There are various ways of dealing with this. One is to lower the cache lifetime. For example, a cache lifetime of 15 minutes might be viewed as acceptable. But this is a decision that has to be made when designing the system. If serving stale data is never acceptable in this case, the other server (the one not receiving the update) always has to be informed. With only two servers, this can be handled by one server directly telling the other. With more servers, each server could of course also have a list of all other servers and send a request to each of them. Another approach would be to send a udp datagram broadcast instead.
Yet another approach, perhaps especially viable in the case of caching data rather than web pages or parts thereof, is to have one memcache server on which data is cached and all webservers fetch their data (unless it's not present in which case they fetch from the db and re-cache on memcache server). This obviously reintroduces a single point of failure though. But you could either have two memcache servers - in which case you always need to create cache on both when caching and invalidate cache on both memcahce servers when invalidating. Another approach is to simply let the memcache server fail and have the webservers turn to the database whenever they cannot reach the memcache server. But if your site needs the speed improvement by memcaching, it will become slow using this approach, and it could potentially cause your db servers to come to a grinding halt.
Either way, if the guy setting up these things can't get things to work in the way you have decided it should work, perhaps he should be replaced. Or perhaps you "simply" need to discuss how things should work until everyone is on the same page. One of the annoying issues when working with others is that they sometimes do not know how you want things to work without first telling them… Or sometimes you tell them and they understand things differently…
One thing that may help is to also ask him why he has chosen a certain approach. For example, "why have you chosen to use different urls on the two servers?". In this case, I find it highly unlikely that there could be a good answer… but who knows. At least you will get an idea of what he's up to.