Disaster Recovery

In IT infrastructure, Disaster Recovery (known as DR in short), is a process which is put in place to quickly recover to normal business in the event of a failure or disaster.

Tier 1 application:

Applications which require a very high up time and whose failure might cause huge impact on an Enterprise (either financially or in terms of reputation or compliance regulations) are grouped under 'Tier 1 applications'. For instance, banking applications, stock market applications, and air traffic controller applications cannot afford to have a downtime and they need to be up and running all the time.

Disaster:

An infrastructure failure could be because of hardware failure, disk crashes or even natural catastrophes like earthquake, flood or fire etc. Disaster Recovery process typically involves all the steps necessary in order to bring the business back to normalcy.

Recovery:

A Disaster Recovery setup requires the infrastructure to be replicated and redundant setup to be in place, which duplicates the entire infrastructure setup. Also, generally, the DR infrastructure should be deployed in a different geographical region, in order to guard against natural calamities.

A DR setup also requires the data to be replicated across redundant deployments, so that when one infrastructure goes down, the other one can start serving requests immediately. For that to happen, the data persisted by the system (databases/object stores etc.), should be replicated in real time to the redundant system.

Replication:

Replication is a process in which data is continuously and incrementally copied over to another redundant data storage system. This happens in real time. Only modified data is copied over and thus ensures another copy is available in case the primary copy goes down.

Replication types:

There are two ways in which the data could be replicated across redundant systems:

Hardware Replication
Software Replication

1. Hardware Replication

In this type of replication, the data copy is performed by the storage hardware. The replication happens at the hardware level. One of the disadvantages of this process is that it requires matching hardware at both the sides. It might sometimes become expensive to maintain matching hardware at primary production and backup sites, since generally high performance hardware is used at production and comparatively less performance hardware could be used at backup site. However hardware replication doesn't put too much of an overhead on the production servers as it happens at hardware level.

2. Software Replication

In this type of replication, the data copy is performed by software at hypervisor level or often by database replication software. This type of replication has the advantage that the software is aware of the application state while performing replication and works well with distributed systems as well. It is not a mere block to block data copy as in hardware replication. Software based replication could add additional overhead on performance.

Disaster Recovery