Enable deduplication in Zmanda to save space and speed up data transfers to storage
What is server-side deduplication?
Deduplication - regardless of where it happens - checks for redundant data and removes those redundancies, only saving one copy of the data and thus saving storage space. There are multiple mechanisms used to perform deduplication, but we won't get into all of them here.
Server-side deduplication means the data is first transported to the backup server where it is then deduplicated and sent to storage. The result? Storage usage is extremely efficient, and data transfers from the backup server to the storage happen much more quickly than non-deduplicated data.
How Zmanda does dedupe
Zmanda uses a rolling hash at a byte level. If it finds a redundant block, it will calculate another full cryptographic hash to ensure that the block is the same, then it will perform the deduplication. For more information, see this link.
LZMA compression and AES-128-bit encryption are both baked into the deduplication tool, so the other compression and encryption options are disabled once you activate deduplication within the Zmanda Management Console.
System Considerations
Data will be staged in the /var/lib/amanda/staging/zbackup directory while it is being deduplicated. Make sure that you have enough space allocated to accommodate your backup sizes.
It is also recommended to allocate 1MB of RAM for every 1GB of un-deduped data.
How to enable deduplication in Zmanda
Here are the steps:
1. Install zbackup on the backup server using the following command
//Debian-based distributions(apt or apt-get)
sudo apt install zbackup
//RHEL-based distributions (dnf or yum)
sudo dnf install zbackup
2. Log into the ZMC and go to the Sources page. Edit a source using the green pencil icon, and check the box next to Data Deduplication
3. That's it! When you run a backup that contains this source, the data will be deduplicated. To verify, go to the storage location and look at the size.
4. Performing a restore is business as usual. Just go through the normal steps and your data will be rehydrated and restored to the location of your choice.
Performance
The effectiveness of deduplication depends upon how many redundant blocks of data there are. Therefore, some types of data will be duplicated better than others.
In an in-house test using Zmanda, we backed up the /var/log directory of a Linux client (logs are typically dedupe-friendly files). Here are the results:
Dedupe Enabled/Disabled |
Raw Data Size |
Stored Data Size |
Dedupe Ratio |
---|---|---|---|
Disabled |
3.6 GB |
3.6 GB |
1 |
Enabled |
3.6 GB |
47 MB |
76.9 |