Raw Thought

by Aaron Swartz

Lazy Backup

If there’s one thing good UI designers know, it’s that the best UI is not to have one at all. Applications should just save, security should just work, and computers should just backup.

Apparently that last task is a harder than it appears, since I still haven’t found decent backup software for Unix (OS X and GNU/Linux).

Here is how the software should work:

  1. I install it.

  2. I point it at some storage server (ideally Amazon EC2 and S3, but if that’s too hard then a GNU/Linux server with a large drive).

  3. I give it a maximum space limit (e.g. store no more than 200GB).

  4. I give it a maximum up-bandwidth limit (e.g. use no more than 5K/s).

  5. I tell it to run.

From then on, it should just work. In the background, it will upload my files to the server using only 5K/s of bandwidth. If I get disconnected from the Internet or reboot my computer, when I get back on it will pick up where it left off. If a file changes it will only send the diff and store that as well. When I run out of disk space it will delete the old diffs.

It will preserve all the Unix ACLs and permissions and weird Mac OS X resource forks and stuff so that if my drive ever dies I can make a full bootable restore from the backup.

Does this software exist?

The closest I’ve seen is rdiff-backup, which is very nice but fails to automate some key steps.

If it doesn’t exist, let me know if you’re interested in writing it (a wrapper around rdiff-backup to do it shouldn’t be too hard, I would think). I’d be willing to offer a bounty.

You should follow me on twitter here.

November 29, 2006

Comments

I like Deja Vu (which comes bundled with Toast). I just schedule daily backups to my server and forget about it. It’s completely faceless and just backs up my home folder without bothering me as long as I have my server volume mounted (which I always do anyway).

posted by Mike Cohen on November 29, 2006 #

rsync can do pretty much everything you ask for (although I’m not sure if it can handle the MacOS stuff unless it’s stored in a regular file). I have a setup where everything you mention is in place except for the bandwidth and disk space limitations (okay, it uses rsync over ssh and cron to schedule the jobs). rsync does support a bandwidth limit option, and with du, another cron job can go through and clear out old backups.

posted by Sean Conner on November 29, 2006 #

http://www.jungledisk.com/ is an easy way to use Amazon S3. I currently run a shell script to rsync to an external hd for backup, but you are right, there is no good solution. Maybe the Time Machine feature in the next mac os release will help.

posted by Keizo Gates on November 30, 2006 #

Would you please make another blog post if you find the solution to this problem or, as you propose, it finds you?

posted by Niels Olson on November 30, 2006 #

Duplicity isn’t quite what you want, Aaron, but it is closer than anything else I’ve seen in the Linux space.

posted by Luis Villa on November 30, 2006 #

Are you looking for something that will upload your files as you save them, or using a nightly cron or similar? It would be really nice if it could detect when files were modified, instead of having to scan your entire filesystem for changes (which purportedly Time Machine will be able to do, but it looks limited in other ways).

On my last machine, I used rsnapshot. It makes incremental backups, and handles scheduling for you, which is nice. Also, you don’t need any special tools to restore the backups.

But I have just switched to a macbook. As it happens, my new backup drives arrived in the mail this very afternoon. I think I’ll give rdiff-backup a try.

Anyways, I’m pretty interested in your bounty. I wonder if I can find the time. It’d be great if you provided more details about what exactly you wanted.

posted by David McCabe on November 30, 2006 #

You may wish to consider pull backups — where the backup server periodically connects to the source and downloads the data for backup, rather than the client explicitly uploading.

Otherwise, if the client has read/write access to the backup medium, an attacker, worm or user/software error on your machine would be able to damage or tamper with your backups.

The ideal solution that I’ve been thinking about is a ‘backup appliance’ — a simple unattended machine that will periodically connect to my primary data store(s) and take an incremental snapshot, ideally via rsync-over-SSH.

It shouldn’t be too hard — Linksys NSLU2 devices already seem to provide most of my desired functionality, with configuration via HTTP and support for backing up SMB shares. (They’re embedded Linux machines, and can be reinstalled with more general-purpose firmware, eg from http://www.nslu2-linux.org/)

Multiple backup appliance instances could be deployed (in different physical locations) for added redundancy.

In the case of a laptop as the source, one issue would be host discovery and connectivity - if the laptop is on a non-routable IP address or inbound port 22 connections aren’t available (which is common), then the backup appliances may have problems connecting. Some kind of backwards SSH tunnelling would probably solve this problem.

Similarly, bandwidth on the client may vary — some mechanism to moderate the bandwidth used by the backup appliances from the client-side would be desirable.

But I haven’t built it yet.

posted by David McBride on November 30, 2006 #

Brackup - http://brad.livejournal.com/tag/brackup. To limit bandwidth, use trickle.

posted by James on November 30, 2006 #

I’ve been using rsyncbackup (http://rsyncbackup.erlang.no/) for about 2 years on a mac.

It will work over a network (since it’s rsync under the hood).

I personaly use it on a local volume since it can handle incremental backups quite well using hard links.

As for the space limit problem, there is no good solution, but you always can delete older backups in a cron job.

posted by Colin on November 30, 2006 #

You MUST, AND I MEAN MUST use incremental backups. Having just one copy is completely useless as a corrupted file is gone forever. Likewise for an accidentally deleted one, when you discover the problem after a backup.

Use a linux based solution that can do it via hard-links. This means you can cd into each directory and see a copy of the full set, without it taking up lots of space for the same data. For this reason I have been using RIBS (RSync Incremental Backups) for several years now. I have a copy of my home directory available going back three years or so.

It doesn’t have a GUI, but you do need to edit the script to set the directory to backup, and the host to SSH to in order to do the backup. Not complicated. Oh, and you need to set up cron to perform backups.

It will pickup from where it left off if disturbed, and it will also only transfer over the changes. I use it to backup a 120gig primary storage area every single night, consisting of files averaging from a few kilobytes up to several hundred meg.

posted by Fraser on November 30, 2006 #

A general purpose bandwidth limiter would make an interesting general application / utility. So many apps need network connectivity, I’m imagining a tool that would let me define up/down limits for specific applications and forget about it. At home we’re on dialup and winXP, being able to limit everything but firefox (i.e. keep all the auto update checkers from getting in the way) would be huge.

Reading the trickle paper makes it sound like a general solution would be non-trivial. It may be easier to make something like a modified firewall and send the backup out over a specific port every time.

I’m interested to hear what you come up with.

posted by AdamB on November 30, 2006 #

Rsnapshot - http://www.rsnapshot.org.

I’ve been using it for 3+ years without a problem, backing up multiple colocated servers to a dedicated machine here at the house with lots of cheap disk.

posted by Bill Bradford on November 30, 2006 #

I’ve been using unison (http://www.cis.upenn.edu/~bcpierce/unison/) for a couple years with no problems. It runs on Windows, Linux and OS X.

The great thing about it is that it does masterless replication, so I can create files on any machine and the rest of the machines eventually get them. Of course, that’s the downside, since you can’t really just let it run and forget about it since you have to manually deal with merge conflicts (pretty rare in practice). You might be able to set up a cron job and have it skip conflicts, I haven’t tried that.

posted by Christian R. Simms on November 30, 2006 #

I’ve been trying to work out a way to get svn to do this.

The idea is to run a cron job to periodically check in the entire drive.

Svn marks changed files on the fly so it wouldn’t have to scan the drive.

I’m sure some linux genius could work out how to throttle down the connection that the job runs in.

Joe

posted by Joe Harris on November 30, 2006 #

You MUST, AND I MEAN MUST use incremental backups.

I agree with Fraser on this point, but I believe Dirvish to be a much better solution than RIBS:

http://www.dirvish.org/

  • John Quigley

posted by John Quigley on November 30, 2006 #

libsloth http://ftp.die.net/pub/libsloth/libsloth.c is a crude but effective bandwidth limiter.

How to use:

env LD_PRELOAD=/path/to/libsloth.so BYTES_PER_SECOND=1000 command arg1 arg2 … argN

Here ‘command’ is a “dynamically linked, non-threaded TCP client or server that you wish to rate-limit”.

libsloth actually “separately rate limits all read()s and write()s to each file descriptor regardless of whether it’s a network socket or not.”

posted by Josh Purinton on November 30, 2006 #

If you’re looking for a linux appliance, might I recommend Buffalo Technology’s Linkstation Pro? It’s faster by far than the Linksys you mention, and I believe it’s cheaper too. And Buffalo’s tech support is absolutely unbeatable. The Linkstation has a very active hacking community.

posted by David McCabe on November 30, 2006 #

Check http://www.carbonite.com - pretty much does exactly that.

posted by on November 30, 2006 #

heh, except you said UNIX. Well, they have it for windows at least.

posted by on November 30, 2006 #

Not linux compatible, but Windows and Mac.

http://www.seagate.com/products/retail/mirra/index.html

I haven’t tried this but it allegedly backs up as you save.

posted by rick on November 30, 2006 #

It depends on how much you care about OS X metadata. As far as I know

http://www.shirt-pocket.com/SuperDuper/SuperDuperDescription.html

is the only backup tool for OS X that does what you want and also keeps all metadata intact.

It creates a .dmg file on a mounted network drive and backs up to the .dmg file.

I’m unaware of any cross platform backup tool that will backup OS X files 100% correctly.

posted by david mathers on November 30, 2006 #

Its a commercial app in beta, but this app is a “set and forget” app that only copies the changed bytes (similar to Rsync) and encrypts the data on your machine before uploading it. I have an interested in this, so I am certainly not impartial:

www.vionobackup.com

On another note though, I have used Rsync for a long time on the Mac. Its a great tool. My only issue was that my data was not encrypted at the server end, so thats why we decided to write our own. Its only in beta and available for Mac and Windows, so appreciate any feedback

posted by Ade on November 30, 2006 #

I second rsnapshot (http://www.rsnapshot.org/)

posted by daniel on November 30, 2006 #

Try looking at ZFS:

http://www.opensolaris.org/os/community/zfs/

“ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.”

Snapshots are just one of many nice features ZFS offers. That said, it’d take some serious work to get it working on your Mac, assuming it’s possible at all.

However, you may just have to wait a while, as it might appear in Apple’s next OS release:

http://www.oreillynet.com/onlamp/blog/2006/08/proof_that_os_x_leopard_will_u.html

posted by Nick on November 30, 2006 #

I use backupninja, which sounds may be just the light user-friendliness on top of rdiff-backup you’d like.

posted by Andy on November 30, 2006 #

Boo for using _ for italics…

Orielly Link

posted by Nick on November 30, 2006 #

+1 for rsnapshot. One additional thing I didn’t see mentioned is that Mac OS X’s default rsync binary (which would be doing the gruntwork for rsnapshot) takes a -E flag to properly capture resource forks and other extended attributes of HFS.

posted by Cody Raspen on November 30, 2006 #

Aaron,

It’s commercial software, but Storage Exec from Veritas/Symantec does file level replication between a source and target share/file system. It can throttle CPU and bandwidth utilization.

BackupExec with Continuouse Data Protection has a similar functionality, with an agent.

Your data is most important, but the ability to perform a fast, reliable recovery of a system is critical for business use. For this purpose, I bought Acronis when my daughter went to college with her laptop. It creates a boot CD (linux based) that allows network and USB connections to restore an image to the machine. Backup rates are ~1 Gig/min for USB 2. Restores are a bit longer, but they are 100% reliable.

Echoing a theme from previous posts, Acronis supports an incremental backup feature and you can schedule using the Acronis scheduler.

posted by Paul Begley on November 30, 2006 #

Try Bacula. I am using it to back up several Linux and Windows servers on our LAN and it’s perfect. It does everything you are asking for and more. I pretty much forget about it until I rm -rf * in the wrong directory, and then it restores it easy as can be. It has a “director” on one (Linux) machine and you schedule it to connect to the other machines and do incremental backups onto another storage machine or backup device. It is very flexible and reliable.

posted by Paul Viren on December 1, 2006 #

Another thing that’s tricky about setting up backups is getting the includes and excludes just right. Don’t want to leave anything out (I’ve done it; obscure non-home-dir file that I really needed), but don’t want to waste space on, say, iMovie’s included themes.

It would be great if somebody published a ready-made black/whitelist for Mac OS X, or maybe even a script that examined your system and wrote one.

posted by David M. on December 1, 2006 #

Christian R. Simms,

I had the idea to use svn to do incremental backups to, but I hit some walls pretty fast.

1 - it’s quite slow with binaries 2 - you will have big time fun with permissions 3 - svn puts a .svn folder in every folders

except for those 3 points, it could make a decent backup system.

my 2¢

posted by h3 on December 1, 2006 #

Update: rdiff-backup (1) has no pretend mode, (2) is ludicrously slow on small files, and the real showstopper, (3) doesn’t work over mounted network volumes — but doesn’t tell you that until it’s finished its backup, only to leave it in an inconsistent and unusable state after all those hours.

posted by David McCabe on December 2, 2006 #

I’ve used Amanda (http://www.amanda.org/) with sucesss, it uses (as much as possible) the client’s command line tools, there is no GUI but it gives you nightly email reports if you want. Stages backups to a local hard drive before comiting them, so there is some automatic fall-back in case your backup device isn’t there.

I haven’t used it to backup to hard drives, but it does seem to understand bandwith, backup windows, priority by machine and by partition, etc.

I don’t know if it does everything you want, but I have similar feelings about software and it works for me.

posted by David Rouse on December 3, 2006 #

So this is what I’ve come up with: rdiff-backup, besides being a shady and un-feel-good program, doesn’t handle lots of small files, at all. So I’m making a disk image with superduper, and then incrementing the image with rdiff-backup. We’ll see how it goes. Superduper is super-slow, and also shareware, so it’d be cool if somebody knew of a similar program that works better or is free. It’s a front-end for ditto(1) from what I understand.

posted by David McCabe on December 10, 2006 #

try backupninja and ninjahelper http://dev.riseup.net/backupninja/

posted by jp on December 15, 2006 #

I’m surprised nobody has mentioned backuppc (http://backuppc.sourceforge.net). It pulls backups down through the network, does incremental backups, and even pools files across PCs, so that only one copy of any given file is stored in the backup. I have it backing up 9 Linux boxes right now, 434 GB of backups being stored on a 120GB RAID 1 array, with 14 GB available…

Nice web interface for restoring backups, automatic notification mails if it’s unable to do a backup for an extended period of time, only backs up machines after work hours if they’re constantly online—but if they’re not, will pick up where it left off as soon as it sees a machine that has been offline…

It copies using Rsync or Samba, so it’ll backup Windows, too, if you need it to… I have it running on a 10 year old dual 500MHz server that can’t really be used for anything else…

Haven’t figured out how to cap its size/bandwidth, but I find it incredibly easy to (not) use, and have even used it to restore files for clients with a few clicks…

Cheers, John

posted by John Locke on January 1, 2007 #

CrashPlan might come close to what you’re looking for (Mac OS X). You point it to a server, give it an amount of bandwidth to use, and tell it to go, and it backs up most recent changes first and then the rest of your stuff. When files change, it does a diff and only backs up the bytes that have changed, since bandwidth is a bottleneck. Here’s a review at TidBITS:

http://www.tidbits.com/tb-issues/TidBITS-868.html#10

posted by Scott Teresi on February 28, 2007 #

CrashPlan isn’t just for Mac OS X… it runs on Linux and Windows also (actually, Linux is “coming soon”)

posted by on February 28, 2007 #

I’m currently looking into backup programs as I want to backup a Mac and a Windows laptop to an NSLU2 running Debian Linux.

Boxbackup sounds like it does what you want (although I’m not sure about the bandwidth limiting). It claims to support Mac OS resource forks, although I’ve not tried this yet. A <a href=http://blog.plasticsfuture.org/2006/04/23/mac-backup-software-harmful/”>blog post on Mac backups

put the fear into me about correct handling of the Mac’s mess of metadata. I’d love to see the guy’s set of torture test files for backup tools.

rdiff-backup was horribly slow for me: it took about an hour for an incremental backup to decide to do more or less nothing. I’m not sure whether that’s because the NSLU2 is slow and the comparisons are being done on the server side, or what. I’m going to look into Bacula and Boxbackup when I get a moment. Bacula isn’t the sort of lazy backup you talk of, but seems to be well maintained and used by serious sysadmins, so I’m inclined to trust it over the various rsync based scripts which are out there.

posted by Paul Wright on February 28, 2007 #

You can also send comments by email.

Name
Site
Email (only used for direct replies)
Comments may be edited for length and content.

Powered by theinfo.org.