Problem: protecting your user’s data

I recently wrote about using tarsnap to backup a linux server and that’s especially important with the recent revelations about PRISM. Tarsnap is “backups for the truly paranoid” but it’s only part of the story and perhaps isn’t the best route for all servers.

With a Rails server, you can mostly forget about the code (it’s on Github), the machine configuration (it’s recreated by Puppet/Chef and Capistrano) and all the temporary stuff (PIDs, logs etc). What’s really important is the user data and that’s typically stored in one or more database systems and perhaps on the filesystem (e.g., uploaded files). It’s not enough to simply take a snapshot of the database files as this can result in a corrupt backup. You actually need to dump the database (e.g. using pg_dump for Postgres) and you’ll want to package this up into a little script that can be scheduled to run regularly and will ship those backups to the cloud.

Solution: The Backup gem

Enter the Backup gem which ties together several collectors (files archives and database dumps), compressors and encrypters, and storage destinations (e.g. Amazon S3, Dropbox, FTP, local files etc) into a workable backup system. And it’s easily configured through a single file which is great for easily grokking your backup strategy.

Instructions

Install and configuration

Install the backup gem gem install backup

Generate a backup configuration for your app (your app will vary, and there’s good documentation here):

[code lang=bash]
backup generate:model -t my_backup --archives --databases=postgresql,redis --compressors=gzip --encryptors=gpg --storages=sftp,s3 --notifiers=mail
[/code]

This will create a configuration file in ~/Backup/models/my_backup.rb.

Now, edit the configuration file with your database password etc. This is what a fairly basic one looks like in nice readable Ruby code:

[code lang=ruby]
Backup::Model.new(:appname_production_backup, 'Backup the production database and uploads directory') do
split_into_chunks_of 250

archive :uploads do |archive|
archive.root 'app_path'
archive.add "app_path/shared/uploads"
end

database PostgreSQL do |db|
db.name = "app_name_production"
db.username = "db_user"
db.password = "db_password"
db.socket = "/var/run/postgresql"
db.additional_options = ["-xc", "-E=utf8"]
end

store_with Local do |local|
local.path = "~/backups/"
local.keep = 50
end

compress_with Gzip

notify_by Mail do |mail|
mail.on_success = true
mail.on_warning = true
mail.on_failure = true
mail.from = "gmail@address"
mail.to = "to@email.address"
mail.address = "smtp.gmail.com"
mail.port = 587
mail.domain = "gmail@address"
mail.user_name = "gmail@address"
mail.password = "gmail_Pa55w0rd"
mail.authentication = "plain"
mail.encryption = :starttls
end
end
[/code]

I put the backup configuration in my Rails config directory but this does mean that it’s deployed on both the production and staging servers. I add a guard at the top of the backup config file to check the RAILS_ENV:

[code lang=ruby]
return Backup::Model.new(:_production_backup, 'Nothing'), {} unless ENV['RAILS_ENV'] == 'production'
[/code]

Schedule

Next, you’ll want to automate the backup process. I use the whenever gem to schedule cron tasks in my application so I added the following line:

[code lang=ruby]
job_type :envcommand, 'cd :path && RAILS_ENV=:environment :task :output'
every 1.day, at: '0430' do
envcommand 'backup perform -t appname_production_backup -c ./config/backup/config.rb'
end
[/code]

Notice that I use the -c flag to reference the backup configuration inside the current directory (i.e., the Rails dir). By default, backup expects the configuration to be stored in ~/Backup/models/my_backup.rb so we just need to point it at the configuration file inside our Rails app directory. Doing this has the advantage that I can update the backup strategy using the familiar the Capistrano workflow.

I would recommend setting these mail options

[code lang=ruby]
mail.on\_success = true
mail.on\_warning = true
mail.on_failure = true
[/code]

for a few days so that you’ll receive (hopefully) comforting emails every day and can verify the backups before turning off mail.on_success and letting it work away silently.

For the moment, I’m actually storing the databases locally and relying on the hosting provider’s regular backups to copy them offsite. As usage and importance of the data grows, I’ll probably convince the client to add an encrpyted S3 backup but that’s only a few more lines of code :-) You could, of course, also point TarSnap at the backups directory for very secure offsite storage.

Read more

The backup gem has some really excellent documentation that’s worth reading. It’s also possible to extend the functionality by writing new handlers (e.g. to backup a new database system).