Understanding AlwaysOn Availability Groups in SQL Server

AlwaysOn Availability Groups (AGs) are a high availability and disaster recovery solution introduced in SQL Server 2012. They enable you to group a set of user databases together for failover and replication across multiple SQL Server instances.

What Is an Availability Group?

An Availability Group is a logical container for a group of user databases—called availability databases—that fail over together. Each AG includes:

One primary replica (read-write)
One or more secondary replicas (read-only or standby)

These replicas reside on different SQL Server instances, typically across different nodes in a Windows Server Failover Cluster (WSFC).

Key Components of AlwaysOn AG

Primary Replica: The source of all write operations. All changes are sent from here to the secondary replicas.
Secondary Replicas: Receives changes from the primary. Can be configured as readable or for backup purposes.
Availability Databases: A collection of user databases that are replicated and failed over as a unit.
Availability Group Listener: A virtual network name (and IP) clients use to connect, automatically routing to the current primary replica.
WSFC: Provides health monitoring and failover orchestration for the AG.

How AlwaysOn AG Works

The AG uses a combination of database mirroring technology and WSFC. Here’s a step-by-step overview of how it works:

The primary replica receives client write operations and logs them in the transaction log.
The log records are sent to each secondary replica. This can be done synchronously or asynchronously:
- Synchronous: Primary waits for acknowledgment from secondary before committing the transaction.
- Asynchronous: Primary does not wait; better for long-distance DR setups but can result in data loss on failover.
Secondary replicas apply the transaction logs to their local copies of the databases, keeping them in sync.

Failover Types

Automatic Failover: Requires synchronous replication and a healthy WSFC quorum. Seamless for the application.
Manual Failover: Used in maintenance or when automatic failover isn’t configured.
Forced Failover (Data Loss): Forces promotion of an asynchronous secondary. Risk of data loss.

Read-Scale and Backup Offloading

AlwaysOn AGs allow routing of read-only queries to secondary replicas using “Read-Only Routing.” This helps balance workloads and improve performance.

Similarly, backups can be offloaded to secondary replicas, reducing the load on the primary system.

Limitations to Be Aware Of

Databases in an AG must use the full recovery model
AGs require Windows Server Failover Clustering
Complex to configure and monitor without tools

Diagram: Data Flow in a Synchronous Commit Availability Group

The following describes how data moves between replicas in a synchronous commit configuration:

Client sends a write transaction to the Primary Replica.
Primary writes the log record to the transaction log and sends it to the Secondary Replica.
Secondary Replica receives and hardens the log to disk (writes it to its own transaction log).
Secondary sends an ACK (acknowledgement) back to the Primary.
Primary commits the transaction and sends confirmation to the client.

This roundtrip ensures zero data loss, but can introduce latency if the secondary is slow or experiencing I/O pressure.

Illustration

+-----------------+        Transaction        +--------------------+
|  Client App     |  ---------------------->  |   Primary Replica  |
+-----------------+                           +--------------------+
                                                 | Write to log
                                                 V
                                          +--------------------+
                                          | Transaction Log    |
                                          +--------------------+
                                                 |
                                 Send log record to secondary
                                                 V
                                      +-----------------------+
                                      |  Secondary Replica    |
                                      +-----------------------+
                                              |
                                   Harden log and send ACK
                                              V
                                +----------------------------+
                                |   ACK sent back to Primary |
                                +----------------------------+

       (Only after ACK is received does the Primary commit the transaction)

This behavior is what enables synchronous commit mode to guarantee zero data loss, but it also means that performance is directly tied to the responsiveness of the secondary replica.

Redo Processing in Secondary Replica

After a transaction log record is hardened (written to disk) on the secondary replica, it still needs to be replayed into the actual data file through a process called redo. This process is handled by the redo thread on each secondary database.

The redo queue holds the log records that are waiting to be applied to the data pages. Until redo is complete, the changes are not visible in the data file.

Step-by-Step Flow

Primary replica sends log records to the secondary.
Secondary hardens the log records to its transaction log.
Those records are placed in the redo queue.
The redo thread applies the changes to the data file (i.e., performs logical/physical updates).
Once redo is completed, the database pages reflect the committed changes.

Diagram: Redo Process on Secondary

                   +------------------------+
                   |  Secondary Replica     |
                   +------------------------+
                            |
                   Hardened Transaction Log
                            |
                            V
                   +--------------------+
                   |   Redo Queue       |
                   +--------------------+
                            |
                            V
                  +------------------------+
                  |  Redo Thread Applies   |
                  |  Changes to Data File  |
                  +------------------------+
                            |
                            V
                 +-----------------------------+
                 |   Data File Updated with    |
                 |   Committed Transaction     |
                 +-----------------------------+

Note: Redo lag can occur if the secondary replica is under heavy I/O pressure or lacks CPU resources. Monitoring redo queue size and redo rate is crucial in synchronous environments to avoid latency.