HA (High Availability)

Definition

High Availability (HA) is a system design approach that ensures minimal downtime by eliminating single points of failure. HA systems use redundancy (multiple components performing the same function) so that if one component fails, another takes over seamlessly.

HA is typically measured in “nines” — the number of 9s in annual uptime percentage.

HA Uptime Measurement

Uptime Downtime/year Availability
90% 36.5 days One nine
99% 3.65 days Two nines
99.5% 1.83 days Two and a half nines
99.9% 8.76 hours Three nines
99.95% 4.38 hours Two and a half nines
99.99% 52.6 minutes Four nines
99.999% 5.26 minutes Five nines (telco-grade)
99.9999% 31.5 seconds Six nines

HA Patterns

Pattern Description Example
Active-Passive Primary handles traffic; standby takes over on failure Database master/replica
Active-Active All nodes serve traffic simultaneously Load balanced web servers
Multi-region Systems deployed across geographic regions CDN, cloud multi-region
Chassis redundancy Redundant power supplies, fans, controllers Enterprise switches, routers
Database replication Real-time data copy across nodes MySQL replication, PostgreSQL streaming

HA vs Fault Tolerance

Feature HA Fault Tolerance
Downtime Brief (seconds-minutes) Zero (seamless failover)
Cost Moderate Very high
Complexity Moderate High
Use case Most enterprise systems Mission-critical (finance, healthcare)
Example Load balanced servers Mirrored systems with instant failover

HA Components

Component HA Strategy
Servers Clustering, load balancing, failover
Storage RAID, SAN replication, dual-controller arrays
Network Redundant paths, link aggregation, VRRP/HSRP
Database Replication, clustering, sharding
Power UPS, dual power feeds, generators
Data Center Multi-site deployment, geographic redundancy

HA Tools and Technologies

Tool/Technology HA Function
Keepalived VRRP for virtual IP failover
Pacemaker/Corosync Cluster resource management
HAProxy Load balancer with health checks
NGINX Load balancer with upstream health checks
MySQL Replication Database master/replica HA
PostgreSQL Streaming Replication Database HA
etcd Distributed consensus for HA
Kubernetes Pod scheduling, ReplicaSets, StatefulSets

HA in Kubernetes

  • Pod replicas: Multiple replicas of the same Deployment
  • Node redundancy: Multiple worker nodes
  • Control plane HA: Multiple API servers, etcd cluster quorum
  • Persistent volumes: Multi-path storage, replicated PVs
  • Service: Stable endpoint regardless of pod failures
  • Disaster Recovery — HA prevents downtime; backup prevents data loss
  • Cloud — cloud providers offer multi-AZ HA

References