Files
David Eisinger 8e5e3781af Fix code fences
2025-02-16 15:01:33 -05:00

62 lines
2.5 KiB
Markdown

---
title: "Backup your Database in Git"
date: 2009-05-08T00:00:00+00:00
draft: false
canonical_url: https://www.viget.com/articles/backup-your-database-in-git/
---
**Short version**: dump your production database into a git repository
for an instant backup solution.
**Long version**: keeping backups of production data is fundamental for
a well-run web application, but it's tricky to maintain history while
keeping disk usage at a reasonable level. You could continually
overwrite the backup with the latest data, but you risk automatically
replacing good data with bad. You could save each version in a separate,
timestamped file, but since most of the data is static, you would end up
wasting a lot of disk space.
When you think about it, a database dump is just SQL code, so why not
manage it the same way you manage the rest of your code --- in a source
code manager? Setting such a scheme up is dead simple. On your
production server, with git installed:
```sh
mkdir -p /path/to/backup
cd /path/to/backup
mysqldump -u [user] -p[pass] --skip-extended-insert [database] > [database].sql
git init
git add [database].sql
git commit -m "Initial commit"
```
The `--skip-extended-insert` option tells mysqldump to give each table
row its own `insert` statement. This creates a larger initial commit
than the default bulk insert, but makes future commits much easier to
read and (I suspect) keeps the overall repository size smaller, since
each patch only includes the individual records added/updated/deleted.
From here, all we have to do is set up a cronjob to update the backup:
```
0 * * * * cd /path/to/backup && \
mysqldump -u [user] -p[pass] --skip-extended-insert [database] > [database].sql && \
git commit -am "Updating DB backup"
```
You may want to add another entry to run
[`git gc`](http://www.kernel.org/pub/software/scm/git/docs/git-gc.html)
every day or so in order to keep disk space down and performance up.
Now that you have all of your data in a git repo, you've got a lot of
options. Easily view activity on your site with `git whatchanged -p`.
Update your staging server to the latest data with
`git clone ssh://[hostname]/path/to/backup`. Add a remote on
[Github](https://github.com/) and get offsite backups with a simple
`git push`.
This technique might fall down if your app approaches
[Craigslist](http://craigslist.org/)-level traffic, but it's working
flawlessly for us on [SpeakerRate](http://speakerrate.com), and should
work well for your small- to medium-sized web application.