HA (High Availability)

Definition

High Availability (HA) is a system design approach that ensures minimal downtime by eliminating single points of failure. HA systems use redundancy (multiple components performing the same function) so that if one component fails, another takes over seamlessly.

HA is typically measured in “nines” — the number of 9s in annual uptime percentage.

HA Uptime Measurement

Uptime	Downtime/year	Availability
90%	36.5 days	One nine
99%	3.65 days	Two nines
99.5%	1.83 days	Two and a half nines
99.9%	8.76 hours	Three nines
99.95%	4.38 hours	Two and a half nines
99.99%	52.6 minutes	Four nines
99.999%	5.26 minutes	Five nines (telco-grade)
99.9999%	31.5 seconds	Six nines

HA Patterns

Pattern	Description	Example
Active-Passive	Primary handles traffic; standby takes over on failure	Database master/replica
Active-Active	All nodes serve traffic simultaneously	Load balanced web servers
Multi-region	Systems deployed across geographic regions	CDN, cloud multi-region
Chassis redundancy	Redundant power supplies, fans, controllers	Enterprise switches, routers
Database replication	Real-time data copy across nodes	MySQL replication, PostgreSQL streaming

HA vs Fault Tolerance

Feature	HA	Fault Tolerance
Downtime	Brief (seconds-minutes)	Zero (seamless failover)
Cost	Moderate	Very high
Complexity	Moderate	High
Use case	Most enterprise systems	Mission-critical (finance, healthcare)
Example	Load balanced servers	Mirrored systems with instant failover

HA Components

Component	HA Strategy
Servers	Clustering, load balancing, failover
Storage	RAID, SAN replication, dual-controller arrays
Network	Redundant paths, link aggregation, VRRP/HSRP
Database	Replication, clustering, sharding
Power	UPS, dual power feeds, generators
Data Center	Multi-site deployment, geographic redundancy

HA Tools and Technologies

Tool/Technology	HA Function
Keepalived	VRRP for virtual IP failover
Pacemaker/Corosync	Cluster resource management
HAProxy	Load balancer with health checks
NGINX	Load balancer with upstream health checks
MySQL Replication	Database master/replica HA
PostgreSQL Streaming Replication	Database HA
etcd	Distributed consensus for HA
Kubernetes	Pod scheduling, ReplicaSets, StatefulSets

HA in Kubernetes

Pod replicas: Multiple replicas of the same Deployment
Node redundancy: Multiple worker nodes
Control plane HA: Multiple API servers, etcd cluster quorum
Persistent volumes: Multi-path storage, replicated PVs
Service: Stable endpoint regardless of pod failures

Disaster Recovery — HA prevents downtime; backup prevents data loss
Cloud — cloud providers offer multi-AZ HA

References

Wikipedia: https://en.wikipedia.org/wiki/High_availability
NIST HA guidelines: https://csrc.nist.gov/pubs/sp/800-34/final
AWS HA best practices: https://aws.amazon.com/architecture/high-availability/