Daily database backup solutions?

m_tt · Jan 9, 2006

Currently using mysqldump which is not a good solution for backing up live databases that are constantly being accessed.

What are some other (better) backup solutions?

devinemke · Jan 9, 2006

m@tt wrote:
Currently using mysqldump which is not a good solution for backing up live databases that are constantly being accessed.

why is that not a good solution?

Sxooter · Jan 9, 2006

Either it doesn't give a consistent database wide state on backup, or it has to lock the whole database against writes while backing up.

With MVCC you can get a complete snapshot of the db in serializable mode without interrupting writes. Like PostgreSQL does with its pg_dump, or innodb tables can do with their special, commercial backup program. I also believe there's a workaround out there to do snapshot backups with mysql_dump and innodb tables, but I'm not familiar with it.

Of course, you know what my solution is...

m_tt · Jan 9, 2006

Thank you for the replies...

Sxooter, i'm not familiar with MVCC?

Sxooter · Jan 9, 2006

MVCC stands for Multi-version Concurrency Control. The way it works is that every time you update a tuple in the database, you leave the old version in place and make a new one with a new timestamp on it. So, let's supposed that two updating transactions start at 00:00:00 and 00:00:05 respectively. A reading transaction running in serializable mode that starts at 00:00:02 and runs until 00:00:10 will not see the changes done by the updating transaction that started at 00:00:05, and the updates that started at 00:00:05 will not block the reading done by the transaction we started at 00:00:02.

InooDB tables, Oracle, and PostgreSQL use MVCC updating methods, which allow writers and readers to basically never clash, and writers only clash when they are waiting to update the same tuple.

This system allows for a coherent backup taken at any time, since you don't have to do any special locks to get an exact "snapshot" of the db at any particular time.

Roger_Ramjet · Jan 10, 2006

Your best backup solution is MASTER/SLAVE REPLICATION as it gives you an up-to-date copy that can be online in minutes should the master fail.

Sxooter · Jan 10, 2006

I too am a big fan of replication as a form of online backup. But don't forget to make backups regularly that leave the site and are archived somewhere in case of catastrophic damage, like from a fire or something.

Also, point in time recovery is nice. Different feature, similar results. More flexibility, a bit more work to recover from a failure. It's not really a "hot backup" solution so much as a way to restore the database right up to somewhere someone did something that made them go "Oh Crap!".PITR

I sincerely wish both pgsql and mysql supported differetial backups though... Someday.

Roger_Ramjet · Jan 10, 2006

Quite agree Sxooter, off-site archiving is a must. It's also something your host should be doing as a matter of course for all servers they run. And if your host does not have a disaster recovery process in place then you should not be using them.

Mysql will replicate over a slow link so you should be replicating to an offsite server anyway. Host goes offline/out-of-business/has-a-fire/gets-burgled/admin-gets-drunk-and-p****s-on-the-server, just update dns to point to the slave and you are back in business at once; essential if yours is a mission-critical application

m_tt · Jan 10, 2006

I have been reading into replication but i'm curious... if it is an exact mirror, and the master copy gets damaged, will it not replicate those damages to the slave?

m_tt · Jan 10, 2006

I'm also curious why type of load, if any, replication will put on the master server?

csn · Jan 10, 2006

With master/slave replication, shouldn't DNS changes be unnecessary - i.e. things will automatically failover to the slave? I guess with some sort of load balancer. Waiting for DNS changes to propagate can take hours to days...

How in the world could differential backups be done - affix a last modified timestamp to every single piece of data in a database?

m_tt · Jan 10, 2006

csn wrote:
With master/slave replication, shouldn't DNS changes be unnecessary - i.e. things will automatically failover to the slave? I guess with some sort of load balancer. Waiting for DNS changes to propagate can take hours to days...

How in the world could differential backups be done - affix a last modified timestamp to every single piece of data in a database?

I agree changing the DNS would not be a practical solution but you could always just change the mysql server IP in your scripts should you need to...

Roger_Ramjet · Jan 10, 2006

If you had a read of the manual you would see that this sort of replication is based on the slave getting a copy of a binary log where the master keeps a record of all db changes. The slave then applies them to it's own copy of the db. Although binary logging does involve some overhead for the master, it is not too great compared to the overhead of an actual table insert or update. Sending the binary log file to a slave consumes only 1 thread on the master as well, leaving many threads to continue it's real work.

Master/slave, or master/slaves, is a very good way to load balance a busy db. All updates on 1 server with multiple servers handling the reads. Normal db load in most environments is 8:1 reads to writes.

The manual also has important information about how to script for master failure in that configuration - with one of the slave being promoted to master, etc.

CSN, my suggestion of master/slave replication was for improved backup, with the added bonus of no downtime or lock conflicts when the backup is run. You get a backup that is as up-to-date as your replication schedule permits.

The other factor was that you could replicate to a backup at a totally different site, giving real disaster protection and recovery. In such a situation, it will almost certainly be on a different network, or subnet, and so dns will need to be updated.
Now, if it is a change to the 'glue' record at the registry, then yes dns will take a long time to replicate through the net. If it is only a dns update on your name servers then you can flush the old record and the update is done. Of course, routers with dns cache may delay things, but with a short TTL this should not be for long. If it really did become a problem then someone has misconfigured the router cache. Even then, an admin worth his salt could force the dns update through the routing protocol being used. (would have been RIS when I was administering web infrastructure but I don't know what they use now)

Roger_Ramjet · Jan 10, 2006

PS, yes m@tt, if the db gets corrupted data in it then that will probably be replicated. But it would also be in the backup if you used another method like tar or sqldump.

If the db schema gets damaged though then replication will not copy that to the slave: it is a binary log of changed data values that the slave gets, nothing to do with table schemas etc. And it's only a binary file transfer that the master is performing, not DDL/DML execution on the slave.

Even if someone succeedded in injecting sql into the file, it will not be executed any more than a virus is executed when you open an infected file in notepad - all you see is garbage characters.

csn · Jan 10, 2006

Backups is one thing. But having to rely on DNS propagation and a sysadmin for failover in a production setup is unacceptable.

Sxooter · Jan 10, 2006

Actually, csn, it all depends on what you're doing in production.

It is quite possible to have a system that is better halted and waiting than failed over, depending on the usage.

Requirements dictate usage. And nothing is fool proof. In something like payroll processing, or batch oriented jobs, human intervention is usually preferred. In some, asynchonous replication with a 12 hour delay is needed. In some, instantaneous failover even with some possible data loss is preferred, like maybe a real time tracking system.

there is no one size fits all in replication and fail over.

m_tt · Jan 11, 2006

Thank you all for your wisdom I searched this board and this is probably the best thread related to the subject.

We are going to go with replication to a local box in our office and run backups (mysqldump->gzip) on the local machine so that it does not interfere with our active database. I think this is the best solution for us.

Roger_Ramjet · Jan 11, 2006

And in a disaster, and I mean a real disaster like your ISP's premises being burnt out, then any alternate server is gone as well. That is why any realistic disaster recovery process must include off-site archiving. Having a slave db on a backup server at a different physical location, even if it is just a server in your back bedroom, is one answer. Otherwise it's an archive file that would need to be installed elsewhere to be used. And DNS will need to be updated to the new location whichever strategy you adopt.

This thread started out about backup for a db, not fail-over with guaranteed uninterupted service. The best backup is the one that is most current, and master/slave replication can keep a backup current to within seconds or minutes rather than the hours or days which tar and tape and sqldump can only manage.

Roger_Ramjet · Jan 11, 2006

m@tt wrote:
Thank you all for your wisdom I searched this board and this is probably the best thread related to the subject.

We are going to go with replication to a local box in our office and run backups (mysqldump->gzip) on the local machine so that it does not interfere with our active database. I think this is the best solution for us.

Glad we could be of help Matt. That's a good solution you have chosen.

(my last post was to cnn - we cross posted)

DoctorWho · Jan 11, 2006

Ohh, the Copy and Paste method using a batch file at MidNight works pretty darn good for me.

Then I just copy that to tape, for external disk.

Daily database backup solutions?

Mm_tt

Ddevinemke

SSxooter

Mm_tt

SSxooter

RRoger_Ramjet

SSxooter

RRoger_Ramjet

Mm_tt

Mm_tt

Ccsn

Mm_tt

RRoger_Ramjet

RRoger_Ramjet

Ccsn

SSxooter

Mm_tt

RRoger_Ramjet

RRoger_Ramjet

DDoctorWho