Upgrading from Crashplan to Arq
Updated on
in
Computer
Tags:
Backup
Software
Arq
Crashplan
Motivation
I have been using Crashplan for the better for at least half a decade and wrote several posts about it. Over these years through Code 42, the company behind Crashplan, make a bunch of changes that it unbearable to use.
The first issue is related to the uploaded speed. Historically Crashplan was not known for being the fastest backup system out there. The usual issues were related to being de-duplication turned on. In my case, I ended up getting upload speeds of ~100kB/s. I am not sure if there are technical reasons for this but the speed was pretty constant regardless of which corner of the world I backing up my data. Coincidentally this in is about the 10GB per day that Crashplan quotes on their webpage. The fix for this problem was to disable the de-duplication. Early versions of Crashplan had a GUI option to select the level of de-duplication. This option was later removed but the setting was still changeable via the configuration file. In the last major update, Crashplan removed the config files all together and one had to resort to some “creative solutions” to change the settings. Lately, it seemed that the speed issue got better (at least for my rather slow connection). I don’t know if this was because I uploaded less data recently or whether that is a permanent change. Either way, this only shows that you are at Crashplan’s mercy.
The second issue was related to the never-ending block synchronization. I understand the importance of verifying the data against the backup. But if you have a large amount of data this process can take days. With the latest update, the block synchronization was trigger at least once a week which meant that you could not backup your data for days and if for whatever reason one of my external hard drives got disconnected the whole process started all over again.
Another issue that started becoming more prominent is resource consumption. You will find plenty of people out there complaining Crashplan is bad because is not “native” but written in Java and therefore has bad performance. If you do not know what that means don’t worry it’s complete nonsense and you can ignore these statements, Java is plenty fast and more than suitable for this kind of job. The real problems seem to be with the design and the fact that Crashplan will hold a lot of data in memory. Crasplan’s official recommendation is to allocate 1GB of memory for every TB of data you have. This is clearly excessive and not necessary if you look at other backup systems.
For all these reasons I decided to investigate an alternative backup strategy.
Requirements
Backup solutions are very subjective. In this section, I lay out my requirements for an ideal backup solution.
- Fast Backups: The backup should be fast and not unnecessarily limited (at some point you will run into limits with everything but 100 kB/s is does not seem reasonable to me).
- One solution: I like to have a online backup as well as local backups (for redundancy and speed I want to have backups on local storage). This means the software needs to be able to allow to multiple backup storage locations.
- Restoration via Client: If I want to restore data I want to restore my files via the client back to the original local and not have to rely on a web download or the delivery of a hard drive.
After in investigating several different softwares products for backup I ended up choosing Arq. In the rest of this article I will outline my reasoning for choosing Arq. I wont go into much details about the other options I evaluated but I will explain in the next section why you should not use Backblaze as this would be the obvious choice for most people.
Do NOT use Backblaze
If you look at what other people complain about Backblaze, it is usually their data retention policy. While I agree that it is abysmal, it is far from the worst part about Backblaze. One of the most important features of a backup solution is to recover your data in case of an issue. In my opinion having the ability to fully recover your backup data builtin into the backup client is paramount. In Backblaze you have essentially 2 options to recover your data:
- You can select some files and download them as ZIP file via their web interface.
- They can send you a hard drive with the selected files.
The problems with these options are manyfold:
- The most important one is, that to be able to send you the files via ZIP or in the hard drive Backblaze need to be able to decrypt the data. This is not true end-to-end encryption and having your data send unencrypted via internet or mail is not great.
- All options are limited in size. This means you need to manually partition the files into chunks and later reverse that process again if you have a lot if data. Let’s assume you would download the files in 1GB ZIP files and you have a total of 1TB. Then you would have to download 1000 ZIP files and unpack/copy them back.
- There is a good chance that you might lose some original permission when unpacking/copying the files back.
- Especially in the web download you will need extra storage, first for the downloaded data and then for the uncompressed data.
- You need to spend extra money in the case of the physical recovery option.
I had on several occasions the need to recover data, including the complete hard drive crash with several TB of data on it. With the recovery builtin into the client all you have to do is to activate the restore process and let the client do its thing (even if it takes a few days). Getting your data into a backup is worthless if you cannot get it out again. A backup solution that does not provide an easy and efficient path to data recovery is essentially a black hole for your data.
PRO TIP: You can actually use Arq to backup to Backblaze’s B2B data storage if you desire so.
Arq
First off I am not affiliated with Arq in any way and this article is just me sharing my opinion after spending (too) much time looking into a Crashplan alternative. You can download and try out Arq from the official webpage: https://www.arqbackup.com/.
If you google for Arq you might find a few bad reviews because of some upgrade between version 5 and 6 or some missing features. Because I started out with version 6, I did not experience these issues and with the current version 7, you should be very happy. While some folks might find this off-putting, I think this is a great testament to the Arq team and their commitment to delivering on what their customers want and need.
Selected Features
Arq by itself is a backup engine that supports multiple storage destinations. You will find support for all big cloud providers and of course local storage options as well. This allows you to be more independent from a single backup provider and still use the same backup software.
Another great feature, that is missing from many other backup solutions, is Arq’s ability to use system snapshots. This is a necessity if you want to ensure to have a consistent backup of a specific time and not a mix of files spread out during the time it took to run the backup. This is a slightly different approach to how Crashplan’s approach to backup files individually but ultimately I found this to be a more sound approach.
Arq is also highly configurable. While it has no centralized, online dashboard that allows for remote configuration, you will find pretty much an option for most things you would like to configure.
Performance
Arg is really fast. Backing up to my external hard drive was only limited by the write speed of the drives. I did some test with my internal SSD and have not doubt it will keep up with pretty much anything you through at it. When backing up online, I always saturate my bandwidth without any issues. Despite that my internet is responsive and I cannot feel
the backup running in the background (other backup solutions I tested do not support QoS).
Cost
You can find the latest pricing on Arq’s webpage. At the time of writing this article, it was about $50 for a single computer and about $80 for a 5 computer license. Arq uses a Dutch Model, which means you get a perpetual license and free updates for 1 year. If you want to get updates after that year you can renew the updates for 1/2 of the license price. If you don’t want/need any updates you can stick with the version that you already have. This price is only for the software itself and does not include any online storage. Arq supports pretty much any storage provider so you can choose the one that matches your needs best. For my, I choose a Google business account which besides the extra storage for my backup also increases my storage for Gmail, Google Photos and other data stored in Google Drive (plus all the extra features you get with the GSuite).
Arq also comes with a subscription plan that includes online storage. This might be a good option for users that have a smaller data volume or do not want to mess with 3rd party storage providers.
Support
While Arq is created by a very small team I would rate their support far superior to CrashPlan. Their responses are quick and to the point. If you ever interacted with CrashPlan support you will spend several rounds of email with just canned, generic advice (which you easily can google).
Wishlist
Nothing is perfect and different users have different requirements. Here is my wishlist of features along with how important I find them:
-
[HIGH] The biggest feature I wish Arq would support is, that when
keep deleted files
is activated, being able to ignore delete files when restoring (or at least have a toggle for that). In the current implementation if you have that feature enabled and you restore a snapshot you would restore also all the deleted files, which is semantically incorrect as the files have not been around when snapshot was taken. -
[MEDIUM] A Linux/BSD client. Just a CLI would be sufficient.
-
[MEDIUM] The smallest schedulable interval between backups is 1 hour. It might be useful to allow for shorter intervals.
-
[LOW] Cloning a backup target. It would be created if I could duplicate a backup target with a new UUID. My use case for this would be when a backup on my external HDD would fail and I want to duplicate the backup (including the history) from a different backup location.
Comparison
Why:
- change in speed and convinience
- resource intensive
- a lot of sync
My Requirements:
- Need tp handle my data volume
- Need to be able to restore via the client
- Need to be fast and resource intensive
Technical:
-
Backups vs backup sets
-
Snapshots vs file verions
-
Scanning vs Life update
-
Speed
-
Support
-
One man show butter technically e.g., better understanding of problem vs canned responses
Arq
- FS snapshots
- support for partial backups (without the need to re–scan the files)
- You choos when you
Runner up: restic
- does not support partial backups
- restore is all or nothing.
- changing file set does not match parent and rescans all the
Don’t use Backblaze
- security
- recovert