Backup strategies23 January 2022
Around April 2019 I invested in an ASUS
BW-16D1H-U_PROBlu-Ray writer which at the time set me back something around €170, and in hindsight it is the only backup solution I have tried that really worked well. At the time I had realised that backing up is something that I had not made a proper job of and with the short career timeout it was one of the major things I wanted to get sorted. I probably noticed it had been five years since my last major storage failure and the one before that was ten years ago so it was something I should not tempt fate by putting it off any further.
Recently my personal workstation seemed to be running significantly slower during file operations than I remember it being in the past, and being unsure whether I was just used to my much newer professional workstation or whether it was an early sign of burnout, I felt it was time to create a complete snapshot. I tend not to use the system for much these days so this snapshot will likley be used for data migration once I get round to building a new personal workstation, which for now I am just waiting on Slackware 15 shipping.
Building a file-serverBack in my university days it was not unusual for me to run multiple computers and even then they were frequently multi-boot systems, a situation that is rare these days with the widespread adoption of virtualisation. At some point I decided that I needed a centralised place for all my files so in 2006 I decided to build a dedicated file-server, the idea being that I can then back up all important files in one go. Things never quite worked out this way but the process was major steps in the right direction of not having data scattered all over the place, and in future years the infrastructure would prove valuable. This file-server was decommissioned after a major data-collation effort and so far I have been skeptical of the practical benefits of recommissioning a new one. I experimented with various forms of removable media while my previous file-server was in commission but they were all impractical.
Tape drivesI forget how much it actually cost but I did manage to find a reasonably priced tape drive back in the mid-2000s, but as it turned out the manufacturer folded shortly after I got it and having only one tape meant it quickly became a failed experiment. Since then even though tapes themselves are very cheap the drives themselves are incredibly expensive and usually need higher-spec interfaces such as SCSI/SAS which drives the capital cost even higher. As a result tapes only make economic sense if regularly backing up many terabytes of data and there is the need to keep multiple snapshots, such as daily/weekly/monthly cycles. Few individuals need this sort of thing and even most companies don't need this level of fall-back.
DVDsBack in 2006 blank DVDs were not exactly cheap and I remember wasting an entire pack working out how to get DVD burning working with proper direct memory access speeds, and once I got this working I do not recall ever actually using that machine to burn anything else to disc ever again. I am unsure whether I had any serious looks into incremental backup software at the time, but in hindsight backing up a 400GB file-server at 4.7GB per disc was never going to happen due to the effort of making all the disc images and then burning them. There is a role for DVDs in backing up but it is limited to mini-snapshots rather than the back-bone of mass storage.
Removable hard drivesI came to the conclusion the only thing I could really use to back up a large hard drive is another large hard drive but that had two problems: convenience and cost. Even though the SATA specification includes hot-swap it was unclear which controllers and hard drive actually supported this functionality, and although I think all modern ones do I suspect back then this was not the case so taking a backup would involve shutting down the machine. As for cost at the time a single large hard drive in itself was a pretty big investment for me, and even today it is not a cheap way of having multiple backups, so I did never adopted this approach. In hindsight it would have been functional but it is certainly not elegant, especially given the wear-and-tear on connectors.
Switching to RAIDBy 2009 I was keeping active projects on the file-server and accessing them using network drives but I had pretty much given up actually trying to back it all up, and then one day I had a bit of a scare with a major hard drive fault. At the time my suspicions were that the read/write head had gone berserk and it was quite likley down to my partitioning scheme that kept things like log files well away from the actual data area that I did not lose anything important. Even after this incident it was clear that I would never get round to regular backups so instead the file-server was rebuilt using RAID, as this sort of hard drive failure is precisely the scenario that RAID is supposed to avoid.
I have over the years looked at the fancier RAID schemes but I always concluded that accepting the inefficiency of RAID-1 (mirroring) was the best economic choice — going from RAID-1 to RAID-5 gives a 32% increase in available space but a 50% increase in hardware cost. Additionally I am not even sure if three-drive RAID-5 is more reliable than two-drive RAID-1 since the odds of both drives in a pair failing is less than having at least two failures among three drives — something known as the airplane rule. In any case by the time I would actually need that extra bit of space it would be time to get a fresh pair of larger drives.
Preparation for New ZealandThe only time the file-server was ever truly backed up was when I used it in my archival operations prior to emigrating to New Zealand. Even before this point I had often used it to make copies of drives that I suspected were in a dodgy state and one actually failed half-way through copying off all the files, but this time round the purpose was to create a snapshot of all my data. Since I was going to the other side of the planet anything that was not part of this snapshot I would basically have no access to. The file-server was then mirrored onto a portable hard drive, and I remember the final synchronisation check using rsync taking around 15 minutes; this portable drive I took with me as hand luggage and I made another copy of it once I got to New Zealand. Because of this I finally backups — and it was both on-site and off-site — of practically all my data
Pretty much anything from prior to 2013 that is not part of this archive I now consider lost, and in subsequent years when I have gone digging through this archive I actually found a software project that I thought had been lost, as well as stuff from the 1990s I had completely forgotten about. Moving to New Zealand meant using a single system and resorting to VirtualBox rather than dual-booting and this is the setup I kept for the next 8–9 years, so there was no point in having a separate file-server.
Rebuilding the file-server?While I have all the parts in stock — including some 4TB hard drives — required to build a new file-server I am doubtful of the practical usefulness of commissioning a new one. The fundamental problem is how to unconditionally back up that quantity of data on a regular base, and if I am not going to do that what is the purpose of having a file-server with that much storage? Yes I could use it to regularly backup other systems but then I have to consider what range of data-loss scenarios I wish to protect against and where I draw the line. Realistically the only way to backup a file-server is with another file-server, and with the use of RAID protecting against hard-drive crashes this only makes sense if the latter server is off-site.
A lucky escapeAt some point in late-2013 I made a complete copy of my personal workstation onto a portable hard-drive, and then in 2014 just before I flew out to a Chinese New Year party in Bristol I decided to burn a few files to DVD — in hindsight this meant that a major data-loss incident resulted in relatively little loss of irreplaceable data. When I got back I found that my solid-state drive had suffered a catastrophic failure that I estimate wiped 60% of all the storage space, including the partition table. Fortunately heavy-duty data recovery using testdisk coupled with the last-minute backup to DVD meant that very little important data was lost; yes I lost things that were important but it was not the wipeout it could easily have been.
At some point after this incident I used a portable hard drive onto which I would use dar to incrementally backup my system, which in hindsight was at least convenient enough that I would do it on a semi-regular basis. This was a functional system so I am not entirely sure when and why I eventually gave up on it — I suspect it was around when going to the gym was taking up my spare time and afterwards I had a tendency to use my company laptop for personal stuff, so I stopped bothering to backup my personal workstation this way. I probably copied a few things to either CDs or USB drives in the mean-time but nothing systematic.
Choosing Blu-RayWhile not cheap I felt that Blu-Ray writers had reached a price point where getting one could be justified and I chose the ASUS one because it followed the BDXL specification that allows for quad-layer discs of 100GB. While 100GB discs even today are about £18 each which is uneconomic because for that price you can get a 120GB SSD, the price of the 50GB discs was falling rapidly and the basic single-layer 25GB ones were already essentially commodity. Although my personal workstation had a total of 350GB it was split into multiple partitions, and a bzip2-compressed tarfile meant that all but one of them was able to fit onto either 25GB or 50GB discs in their entirety — the one exception also fitted after excluding some movie files that I already had separate copies of. Using Blu-ray discs was the first time I had the sort of off-site backup that would cover disaster-recovery.
Why Blu-ray worked for meBlu-Ray worked where DVD failed because the 4.7GB capacity of the latter tended to be that bit too small for most things I wanted to backup, and even 8.5GB dual-layer DVDs were not quite enough for things like my entire Android development setup or entire MP3 collection, whereas the 25–50GB of Blu-Ray was sufficient for even an uncompressed copy of entire home partition. Although the all-in snapshot of my personal workstation I took today needed multiple discs each disc had individual partitions in their entirety so restoration would not need messing around with recombining discs, and this property was preferable than some multi-part archive that could have easily fitted onto fewer discs. Coupled with some proper data organisation Blu-Ray discs would be the largest medium I would ever require anytime soon.
An archival formatBlu-ray also worked where portable hard drives and other usb devices failed is because Bly-ray is really an archival format than a backup format, and at times I was backing up large portions of data it was things that I would basically never overwrite — a notable example is my picture collection which I considered important enough that I wanted an off-site copy; another example is my Windows Virtual Machines which given the temperamental nature of such operating system I may well want to revert to a historical snapshot. Above all being a write-once format short of getting physucally destroyed they are not going to ever get corrupted, whereas things like removable hard drives always have that change of electrical or mechanical failure.
The downside of Blu-RayThe only real down-side with Bly-ray is the apparent failure of BDXL in the market because 100GB discs are relatively rare and uneconomic — I did find a 5-pack for £40 but £8 each is still pretty steep and they never actually turned up — Amazon refunded me on the basis they were a lost delivery but I suspect they were never actually dispatched. Apparently 128GB disc are available but I have never seen any for sale, let alone in stock. When backing up my laptop 50GB was that bit too small whereas 100-128GB would allow me to keep entire partitions on one disc.
Error closing Blu-ray sessions
As an aside there is a long-standing bug in
growisofs that causes a spurious error alert — example console output shown below — at the very end of a Blu-Ray disc burn.
As far as I can tell this happens when a blank disc is used because it automatically gets formatted, but unlike when a pre-formatted disc is used this is not being flagged, so the session closing function complains even though there is nothing wrong with the disc.
For some reason the simple fix never made it into the mainline of any Linux distribution, possibly because the upstream author turned support over to Debian who in turn have shown no interest in maintaining it.
Doing a manual
md5sum check of individual files shows that the data itself on the disc seems fine.
23223173120/23329048576 (99.5%) @3.2x, remaining 0:07 RBU 100.0% UBU 72.7% 23271571456/23329048576 (99.8%) @3.2x, remaining 0:04 RBU 100.0% UBU 78.8% 23319937024/23329048576 (100.0%) @3.2x, remaining 0:00 RBU 27.2% UBU 75.8% builtin_dd: 11391152*2KB out @ average 3.1x4390KBps /dev/dvd: flushing cache /dev/dvd: closing track /dev/dvd: closing session :-[ CLOSE SESSION failed with SK=5h/INVALID FIELD IN CDB]: Input/output error /dev/dvd: reloading tray
A major part of practical data backup is proper separation of actually-important files from other stuff that either does not change much or can easily be re-obtained, so it is possible to do quick partial snapshots and be confident that you actually have a copy of all recently-changed important files.
This is easier said than done but with proper organisation I am pretty sure that my absolutely most important files would fit onto a single CD, and that a DVD would be enough for all irreplaceable non-media files.
The whole point here is to make it quick and easy to make a backup that would cover everything that would be an issue losing.
The bad old days
In the distant past I had files in all sorts of crazy places and this was particularly bad with my earlier Windows-based systems, which is why my all-in archiving operation of 2012 involved making entire drive images even though this meant rampant duplication, and this included nesting of drive images from different operating systems.
Looking back I think in some cases the directory layout was bordering on a deliberate attempt at making keeping track of what is where awkward.
A least with Linux the really important stuff was all in home directories, separate from the vast majority of the software-related files that can easily be re-downloaded, but that still needed drag-net copying because there was not that much organisation within the home directories.
Sorting things out
The first serious attempt at organising my file properly from the start was preparing the SSD I would take with me to New Zealand in 2013, but things started to slip since then.
It was only really the setting up of my laptop as my main computer in preparation for my 2016 trip to China, which was then shortly followed by the rebuild of my desktop system, that I really got a lot of my personal files properly sorted.
Yes I had MP3s under Music, loose files under Documents, photos under Pictures, and website stuff under WWW, but I was far from disciplined in keeping things in consistent places.
What really taught me discipline is the run up to 2019 job change where I had to go through my company laptop, separating out personal files I intended to keep and company files that were to be deleted.
At my next company I actually made a script that archived everything I would want to take with me when I left, namely personal files and program configuration files, which was convenient because I ended up having to clear out my company workstation remotely.
Delegating photograph storage
The vast bulk of files that are important is media files, and while in the past I kept most of them on my laptop and relied on keeping full memory sticks, this was not a great approach as the laptop eventually filled up due to the relatively limited space of the soldi-state drive it had.
I finally decided that I would consolidate all media files onto a dedicated portable hard drive, which also included collections of photos I had dug up from my pre-2013 archive.
While I had multiple reasons for doing this one side-effect was vastly reducing the amount of irreplaceable files that needed to be backed up from my computers.
Backing up of this portable hard drive in itself is not critical since I still have the “original” photos but I have since also archived the pictures onto Blu-Ray discs — for files that basically never change once they are created this is pretty much the optimal solution.
Use of version control
Although making use of Bitbucket was something I started for reasons other than backing-up, it has indirectly negated the need for me to backup a large portion of the files I work on regularly.
In fact on my professional workstation if I excluded stuff under source control or obtained from elsewhere, there was not that much work-related stuff left, and after a quick cleanup I will put most or all of it into a new repository.
Even though it has binary files that are not really suitable for storage in a source control system, I will probably also put Documents into a repository since it is not worth having a separate backup channel for the sake of only a dozen or so megabytes.
Even without the use of an external hosting service version control systems usually negate the need to have multiple copies of a project, and once it is done it is easy to clear out all the files that are no longer required, with the only remaining data being just the changes. In many cases I have often easily relocated past software projects by finding the repository into which I had placed several. Version control is not a magic bullet but it does help make some problems go away and I do wonder how I survived without it.