1. Knowledge Base
  2. Disaster Recovery

How to troubleshoot common issues in Zmanda Disaster Recovery service

This article explores common scenarios that can lead to failures in Zmanda's Disaster Recovery (DR) functionalities. This is for Zmanda Version 5.1. It's divided into two sections:

Section 1: DR sync failure scenarios

This section explores various situations that can lead to DR sync failures, topics covered include:

  • DR repository not configured

  • Unreachable DR repository

  • License issues

  • Health check failure

  • Non-primary current host

Section 2: DR service - restore flows failure scenarios

This section delves into potential causes of failures during the DR restore process, topics covered include:

  • Improper installation

  • Expired license

  • Missing file

  • Import command failures

  • Network issues

  • Security of the storage

  • Power outage during restore

DR sync failure scenarios

Scenario 1: DR repository not configured

If the DR repository hasn't been configured, Zmanda doesn't know where to store server backups, leading to DR sync failure.

To fix this issue, you need to configure the DR Repository where Zmanda will store your server backups.

Scenario 2: Unreachable DR repository

Even with a configured DR repository, the following issues can prevent Zmanda from accessing it:

  • Network issues: Disconnected cables, Wi-Fi drops, or firewall/router problems can cause connection problems.

  • Firewall/antivirus blocking: Firewalls or antivirus software on either side may block traffic. Check ports 137-139 and 445 are open.

  • Account blocking policies: Repeated failed login attempts can result in account lockouts.

  • Missing DR repository: The configured DR repository might have been moved or deleted. Verify its presence and availability.

Specific to SMB storage:

  • Name resolution problems: Issues with DNS or NetBIOS resolution can prevent connection establishment.

  • Incorrect share path: Ensure the path to the shared resource is correct. Typos or incorrect paths can lead to failures.

  • Permissions and share settings: User accounts need proper permissions on the shared resource. Misconfigured share settings can also cause failures.

  • Incorrect credentials: Incorrect credentials for the configured SMB will prevent connection.

DR service error handling for invalid SMB details:

  • The current implementation checks for connection before registering in the ZMC database. Incorrect authentication information will return appropriate errors.

  • If the provided share name is not present in the SMB, an error message will be displayed.

  • Default port and domain name are configured if not provided by the customer.

Scenario 3: License issues

  • An active license is required for all Zmanda operations, including Disaster Recovery. Without a valid license, DR won't take backups and DR sync will fail.

  • If the license has expired, DR sync will be terminated immediately.

Scenario 4: Health check failure

DR service checks the health of servers before backups. If the health check fails, DR sync is terminated. This indicates potential server issues.

To check the status of Zmanda installation, execute the commands setup-zmc, setup-aee and setup-dedup-server as a super user. Upon running these commands, you will be able to view the installation status of Zmanda services. Verify all services are running.

Zmanda employs an exponential retry mechanism when it encounters a failed health check. This means it will attempt to retrieve the server's information multiple times with increasingly longer intervals between retries. However, if the server remains unresponsive or inaccessible after four attempts, Zmanda will ultimately terminate the DR sync process.

Scenario 5: Non-primary current host

DR service only backs up the primary server. In cases where DR restore operations are performed on the secondary server and automatic switching to the primary server from the secondary fails during DR restore operations, DR sync may fail.

DR service - restore flows failure scenarios

This section covers issues that can lead to DR restore failures.

Scenario 1: Improper installation

Check the status of Zmanda components:

setup-zmc

setup-aee

setup-dedup-server

Make sure all components are running.

Scenario 2: Expired license

An active license is required for restoration on the secondary server. Renew your license if your license is expired from Welcome to the Zmanda Network.

Scenario 3: Missing file

The /var/lib/amanda/drservice/drconfig file is crucial for the restore process. Copying this file to the secondary server allows the DR service to establish a connection with the backup repository, ensuring successful restoration of data. Without this file, the DR service wouldn't be able to locate and access the stored backups on the secondary server.

Scenario 4: Import command failures

  • The /opt/zmanda/amanda/bin/drservice import –sources <path-to-drconfig-on-secondary> command must be executed on the secondary server to activate the DR instance.

  • Ensure the drconfig file has read permissions for the amandabackup user with these commands:

chown amandabackup: <path-to-drconfig-on-secondary>
chmod 644 <path-to-drconfig-on-secondary>

Scenario 5: Network issues

Even with a configured DR repository, the following issues can prevent Zmanda from accessing it:

  • Network issues: Disconnected cables, Wi-Fi drops, or firewall/router problems can cause connection problems.

  • Firewall/antivirus blocking: Firewalls or antivirus software on either side may block traffic. Check ports 137-139 and 445 are open.

  • Account blocking policies: Repeated failed login attempts can result in account lockouts.

  • Missing DR repository: The configured DR repository might have been moved or deleted. Verify its presence and availability.

Scenario 6: Security of the storage

  • Name resolution problems: Issues with DNS or NetBIOS resolution can prevent connection establishment.

  • Incorrect share path: Ensure the path to the shared resource is correct. Typos or incorrect paths can lead to failures.

  • Permissions and share settings: User accounts need proper permissions on the shared resource. Misconfigured share settings can also cause failures.

  • Incorrect credentials: Incorrect credentials for the configured SMB will prevent connection.

Scenario 7: Power Outage During Restore

Ensure machines don't experience power outages during the restore process, as it can leave Zmanda in an inconsistent state.

Note:

  • If you've changed the time zone after installing ZMC, it's essential to restart the DR service and perform the DR restore again to avoid unexpected behavior, like incorrectly displaying the instance as inactive after a successful restore.

  • To restart the DR service: Open a terminal window and execute the following command:

sudo systemctl restart __sub-zman-dr

With this article, you should be able to troubleshoot DR sync failures and DR restore failures. If you have any further questions, reach out to our sales team.