High-Availability Dedicated Server Setup with Corosync and Pacemaker
As you must know, a single dedicated server may go down due to hardware crashes, network card issues, or kernel strikes, so uptime matters for everyone. You can use High Availability (HA) clusters, which link two or more dedicated servers together with open-source cluster management software. When one server crashes, another takes over in seconds. In this guide, you will learn to build a HA Dedicated Server with Pacemaker and Corosync.
A modern Linux HA stack for dedicated servers consists of Corosync (cluster messaging), Pacemaker (resource manager), and pcs (CLI). With two or more dedicated servers, you can expose a floating virtual IP and run services that automatically fail over when one server crashes.
Follow the steps below to set up a HA Dedicated Server from PerLod Hosting with Pacemaker and Corosync on RHEL-based systems, such as AlmaLinux and Rocky Linux. Additionally, Debian-based distributions are mentioned for the setup.
Table of Contents
Architecture Overview: Build a HA Dedicated Server with Pacemaker and Corosync
Before starting to build a HA Dedicated Server, it is good to understand what each part of the HA stack actually does.
The components of an HA stack include:
Corosync: Cluster messaging and membership layer, which:
- Maintains the node list and who is in the cluster.
- Detects node failure and handles quorum.
- Broadcasts cluster state to all nodes.
Pacemaker: Cluster Resource Manager (CRM), which:
- Decides where each resource should run.
- Starts, stops, and monitors services through resource agents.
- Handles failover when a node or resource fails.
pcs: Command‑line interface used on RHEL‑like systems to:
- Configure Corosync and Pacemaker.
- Create resources such as IP, web server, DB, etc.
- Manage constraints, properties, fencing, etc.
Resource Agents (RA): Scripts that know how to start, stop, and monitor a service, usually in the OCF (Open Cluster Framework) standard.
STONITH or Fencing in Production: A safety feature that can forcefully power off or reboot a failed node using tools like IPMI, iDRAC, iLO, PDUs, or SBD. In testing labs, you can turn it off, but in real production clusters, you must configure it.
In this guide, we assume two dedicated servers with the same OS as AlmaLinux or Rocky Linux, for example:
- node1.example.com – 192.0.2.11
- node2.example.com – 192.0.2.12
- One service network: 192.0.2.0/24
- One virtual IP for clients: 192.0.2.100, which is not used anywhere else.
- Service: Nginx
Note: In Debian and Ubuntu, the concepts and setup are the same, with a few differences, including:
The installation:
sudo apt update
sudo apt install pacemaker corosync pcs resource-agents fence-agents -y
On modern Ubuntu, you usually manage Pacemaker clusters the same way as on RHEL-like systems. You can use pcsd as the helper service and hacluster as the cluster admin user.
In short, install the packages, enable pcsd, set the hacluster password, authenticate the nodes with pcs host auth, create the cluster with pcs cluster setup, start it with pcs cluster start, and then add your resources.
Prerequisites for Dedicated Servers To Build an HA Cluster
The first step is to prepare your dedicated servers by installing the required packages, setting hostnames, and configuring the firewall.
On all nodes, run the system updates and install the required tools with the commands below:
sudo dnf update -y
sudo dnf install vim net-tools curl chrony -y
Enable the Chrony service to ensure time is synced:
sudo systemctl enable --now chronyd
Set the correct hostnames with the commands below:
sudo hostnamectl set-hostname node1.example.com #on node1
sudo hostnamectl set-hostname node2.example.com #on node2
Configure the /etc/hosts file on both nodes:
sudo vi /etc/hosts
Nodes must be able to ping each other by hostname:
192.0.2.11 node1.example.com node1
192.0.2.12 node2.example.com node2
We assumed you have installed and enabled firewalld on your servers. Allow the Pacemaker, Corosync, and pcsd Web UI traffic through your firewall with the commands below:
sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --permanent --add-port=2224/tcp # pcsd
sudo firewall-cmd --permanent --add-port=3121/tcp # Pacemaker remote (if used)
sudo firewall-cmd --reload
It is recommended to leave SELinux in enforcing mode in production. In testing labs, you can set permissive and then change it for production:
sudo setenforce 0
Install Pacemaker, Corosync, and pcs
Once you are done with preparing your dedicated servers, you must install Pacemaker, Corosync, and pcs on all nodes.
To install them, use the command below:
sudo dnf install pacemaker corosync pcs fence-agents-all resource-agents -y
Enable and start the pcs API service with the command below:
sudo systemctl enable --now pcsd
Verify the service status is up and running:
sudo systemctl status pcsd
The pcs service uses a dedicated system user “hacluster” to authenticate between nodes. On each node, you must set a password for the hacluster user with the command below:
sudo passwd hacluster
Enter the same strong password on all nodes. You will use this password once to let the pcs host auth distribute authentication keys.
Create the HA Cluster for Dedicated Servers
Before the cluster can manage any services, the nodes need to trust each other and share a common configuration. You must authenticate the servers so a single pcs session can talk to the whole cluster, then create the Corosync and Pacemaker configuration and bring the cluster online.
1. Authenticate nodes:
From one node, for example node1, run the command below:
sudo pcs host auth node1.example.com node2.example.com
You will be asked for the username and password. Enter hacluster and its password.
This will connect to pcsd on each node using hacluster credentials and exchange keys so subsequent pcs commands run securely on all nodes.
Verify authentication with the command below:
sudo pcs status pcsd
You should see each node “Online”.
2. Create the cluster: Corosync and Pacemaker Configuration
From the same node, run the command below:
sudo pcs cluster setup --name ha-cluster node1.example.com node2.example.com
This command:
- Generates /etc/corosync/corosync.conf on each node.
- Registers Pacemaker as the cluster resource manager.
- Configures Corosync to use the listed nodes as the cluster membership.
Check the generated Corosync config with the command below:
sudo cat /etc/corosync/corosync.conf
In your output, you must see sections like this:
- totem: Token ring protocol settings.
- nodelist: List of nodes with their addresses.
- quorum: Quorum policy.
- logging: Log output options.
3. Start and enable the cluster:
Start cluster services on all nodes with the command below:
sudo pcs cluster start --all
Enable at booy boot with the following command:
sudo pcs cluster enable --all
Check the HA cluster status with the command below:
sudo pcs status
In your output, you must see something similar to this:
Stack: corosync
Current DC: one of the nodes
Nodes: both “Online”
Daemon Status: corosync: active/enabled, pacemaker: active/enabled, pcsd: active/enabled
Safety Rules for Two‑Node HA Cluster: STONITH and Quorum
Here are the key rules that you must consider for your HA cluster:
In a two‑node HA cluster, classical quorum voting does not work well; if one node fails, the other is 1/2 and technically has no majority. It is recommended to:
Use special two‑node quorum rules like “two_node = 1” or disable quorum enforcement and rely STRICTLY on fencing (STONITH).
On most modern RHEL-based systems, pcs usually sets up a reasonable two-node cluster by default, but you still need to configure fencing (STONITH) yourself to make it safe and reliable.
For a just testing environment, you can disable STONITH and ignore quorum. To disable fencing, you can run the command below:
sudo pcs property set stonith-enabled=false
In a two-node cluster, allow resources without quorum with the command below:
sudo pcs property set no-quorum-policy=ignore
Check the properties with the command below:
sudo pcs property config
You should see “stonith-enabled=false” and “no-quorum-policy=ignore”.
In a production environment, you must configure STONITH. On dedicated servers, you typically have:
- IPMI (BMC) interfaces: fence_ipmilan
- Vendors: Dell iDRAC, HPE iLO, etc.: specific fence agents exist
- PDUs: Power port fencing via SNMP, etc.
Here is a concept example of STONITH via IPMI for node1:
sudo pcs stonith create fence-node1 fence_ipmilan \
pcmk_host_list="node1.example.com" \
ipaddr="192.0.2.201" login="ipmiuser" passwd="StrongIPMIPassword" \
lanplus=1 power_timeout=10 op monitor interval=60s
Ensure the property is set correctly:
sudo pcs property set stonith-enabled=true
Also, you must set an appropriate no-quorum policy, which often stops or freezes in multi‑node clusters.
Important note: Do not run production without fencing.
Add HA Cluster Resources
The HA cluster is now up and running, but it still won’t keep any application online until you tell it what to manage. You can add a floating virtual IP (VIP) and a web service like Nginx as your cluster resources.
1. Create a floating virtual IP (VIP): The clients will always connect to the VIP.
Choose an unused IP in your subnet, for example, 192.0.2.100. On one node, run the command below:
sudo pcs resource create virtual_ip \
ocf:heartbeat:IPaddr2 \
ip=192.0.2.100 cidr_netmask=24 \
op monitor interval=30s
Command explanation:
- ocf:heartbeat:IPaddr2: Standard OCF RA for IPv4 addresses.
- ip: The floating IP address.
- cidr_netmask: Subnet size.
- op monitor interval=30s: Cluster checks every 30s that the IP is correctly configured.
Check the resources with the command below:
sudo pcs status resources
In the output, you should see:
virtual_ip (ocf::heartbeat:IPaddr2) # running on one node.
You can confirm VIP on the running node:
ip a s
You must see 192.0.2.100 assigned to one of the interfaces.
2. Install and configure Nginx:
On both nodes, install and enable Nginx with the commands below:
sudo dnf install nginx -y
sudo systemctl enable --now nginx
Then, you must modify the Nginx default index so you can see which node served a request.
From node1, run the command below:
echo "Served from node1" | sudo tee /usr/share/nginx/html/index.html
From node2, run the command below:
echo "Served from node2" | sudo tee /usr/share/nginx/html/index.html
Next, you must stop Nginx from systemd so Pacemaker can manage it:
sudo systemctl stop nginx
3. Add Nginx as a Pacemaker resource:
Use the command below to create an Nginx resource using the built‑in OCF agent:
sudo pcs resource create web_server \
ocf:heartbeat:nginx \
configfile="/etc/nginx/nginx.conf" \
op start timeout=40s interval=0 \
op stop timeout=60s interval=0 \
op monitor timeout=20s interval=30s on-fail=restart
Command Explanation:
- ocf:heartbeat:nginx: OCF RA that knows how to manage Nginx.
- configfile: Path to main Nginx config.
- op lines: Define start, stop, and monitor operations.
- monitor interval=30s: Health check every 30s.
- on-fail=restart: Pacemaker restarts Nginx locally if a health check fails.
Verify the resources with the command below:
sudo pcs status resources
In your output, you must see:
- virtual_ip is running somewhere.
- web_server running, maybe on a different node at first.
At this point, Pacemaker may choose to run the VIP on one node and Nginx on the other. But we want VIP and Nginx together; to do this, you must group resources and add constraints.
Group HA Resources and Add Constraints
Clients should always land on a node that actually runs your web service, not just the virtual IP. At this point, you can tell the cluster to treat the VIP and Nginx as a single unit so they always move together.
By placing both resources into a group and then testing with a simple HTTP request, you can verify that traffic to the VIP is always served by a healthy node running Nginx, regardless of which server currently owns the group.
1. Create a resource group: The group ensures resources in it always run on the same node, and start or stop in group order.
Create a group that holds the VIP and Nginx with the command below:
sudo pcs resource group add web_group virtual_ip web_server
Check the resources with the command below:
sudo pcs status resources
Now you should see something similar to this:
Resource Group: web_group including virtual_ip and web_server.
Pacemaker will:
- Start virtual_ip, then web_server on the same node.
- Stop them in reverse order.
2. Confirm client access: From a client or one node, run the command below to confirm the client access:
curl http://192.0.2.100/
You should see either “Served from node1” or “Served from node2”.
To check which node holds the group, run the command below:
sudo pcs status resources
The group’s node will match the message.
HA Failover Testing: Confirm HA Cluster Works Correctly
At this point, you can run failover testing to verify that the VIP and Nginx resource group move correctly between nodes.
1. Manual migration: You can move the group to the other node with the command below:
sudo pcs resource move web_group node2.example.com
Check it with the command below:
sudo pcs status
After a few seconds, the group should run on node2.
Then, clear the manual move, which lets Pacemaker decide again:
sudo pcs resource clear web_group
This removes temporary location constraints and returns the cluster to normal policy.
2. Node standby: You can put a node into a maintenance or standby mode with the command below:
sudo pcs node standby node1.example.com
The cluster will migrate web_group to node2 and mark node1 as “Standby”.
You can return it with the following command:
sudo pcs node unstandby node1.example.com
Standby is ideal for planned maintenance such as kernel upgrades, etc.
3. Simulating failure: To simulate a crash, for example, on node1, you can run the command below:
sudo systemctl stop pacemaker
# or
sudo poweroff
Check from node2 with the command below:
sudo pcs status
You must see that node1 is offline and web_group started on node2.
Note: If STONITH is configured, Pacemaker may fence the failed node depending on the configuration.
When the broken server comes back online and joins the cluster again, the services usually stay where they are. They won’t automatically move back unless you set rules that tell the cluster to prefer that node.
Essential pcs Commands: Status, Resources, and Failover Control
In this step, you can explore the most commonly used commands and what each one is for, so you can troubleshoot faster.
1. Cluster status and configuration:
| pcs command | Explanation |
|---|---|
| sudo pcs status | Show overall status (nodes, resources, daemons). |
| sudo pcs status resources | List resources. |
| sudo pcs status nodes | Show cluster nodes. |
| sudo pcs property config | Show cluster properties. |
| sudo pcs resource config | Show the exact resource configuration. |
2. Manage resources:
| pcs command | Explanation |
|---|---|
| sudo pcs resource stop web_server | Stop a specific resource. |
| sudo pcs resource start web_server | Start a specific resource. |
| sudo pcs resource disable web_server | Disable a specific resource. |
| sudo pcs resource enable web_server | Enable a specific resource. |
| sudo pcs resource delete web_server | Delete a resource completely. |
Note: Deleting a resource from a group changes group membership.
3. Constraints:
| pcs command | Explanation |
|---|---|
| sudo pcs constraint config | Show all constraints. |
| sudo pcs constraint location web_group prefers node1.example.com=100 | Example: force web_group to prefer node1. |
| sudo pcs constraint delete constraint-ID | Remove a specific constraint. |
Note: You can see constraint IDs in the pcs constraint config output.
4. Cluster control:
| pcs command | Explanation |
|---|---|
| sudo pcs cluster stop –all | Stop clustering everywhere. |
| sudo pcs cluster start –all | Start clustering everywhere. |
| sudo pcs cluster disable –all | Disable autostart. |
Corosync Config Basics: What Does the HA Cluster File Mean?
Corosync is the communication layer of your HA cluster, and its main setting is located in the /etc/corosync/corosync.conf. In a pcs setup, you must not edit this file by hand because the pcs generates it and keeps it consistent across all nodes.
Here you can explore the key fields to understand it better.
A typical Corosync Configuration file looks like:
totem {
version: 2
secauth: on
cluster_name: ha-cluster
transport: knet
}
nodelist {
node {
name: node1.example.com
nodeid: 1
ring0_addr: 192.0.2.11
}
node {
name: node2.example.com
nodeid: 2
ring0_addr: 192.0.2.12
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_syslog: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
timestamp: on
}
Key points include:
totem: Main protocol settings.
- transport: knet: Modern multi‑transport; supports multiple links.
- cluster_name: Cluster identifier, which must be the same on all nodes.
- Other timing parameters, such as token, consensus, etc., were kept at defaults.
nodelist: Each node includes:
- name: Must match the system hostname Pacemaker sees.
- nodeid: Unique numeric ID.
- ring0_addr: Address used for Corosync traffic.
quorum: Quorum algorithm; two_node: 1 is typically set automatically in two-node configs.
logging: Where logs go.
To make changes in the supported way using pcs instead of manual edits, you can run:
sudo pcs cluster config
Advanced HA Best Practices for Dedicated Servers
At this point, you have a working HA cluster; you can make it stronger and more production-ready on dedicated servers by protecting the cluster from network link failures, planning for shared data and databases, and adding monitoring so you know about problems before users do.
1. Dual‑ring or redundant Corosync links: On dedicated servers with multiple NICs, you can use multiple rings (links) to avoid losing the cluster when one network link fails.
For example:
- ring0 is the primary cluster network.
- ring1 is the secondary cluster or service network.
You can configure it with pcs cluster config update to add multiple link definitions for knet.
sudo pcs cluster config update \
transport knet \
link linknumber=0 \
link linknumber=1
sudo pcs cluster link update 0 priority=255
sudo pcs cluster link update 1 priority=100
This ensures Corosync prefers one link but can fail over to another.
2. Use storage and databases for stateful services: For stateless services such as Nginx acting as a reverse proxy, this cluster setup is enough. For stateful services:
- Use shared storage like iSCSI, FC, NFS, Ceph, etc., with proper RA for the mount, FS, and DB.
- Or you can use DRBD (block replication) and cluster a filesystem on top.
- Databases often have special resource agents and fencing requirements.
These setups are similar at the Pacemaker level but require extra storage design.
3. Monitoring and alerts: Pacemaker provides an RA “ocf:pacemaker:ClusterMon” that runs crm_mon in the background and can output HTML or trigger external scripts on cluster events.
For example:
sudo pcs resource create cluster_monitor \
ocf:pacemaker:ClusterMon \
update=60s extra_options="-E /usr/local/bin/ha-alert.sh"
Your script receives environment variables describing the event and can send email, push to monitoring, etc.
FAQs
Can I use just one server instead of two for building an HA Cluster?
No. A single server is not high availability; it has no failover capability. You need at least two nodes so that when one fails, the other can take over services.
Can I have multiple HA resource groups or more complex constraints?
Yes. You can create multiple groups, apply location constraints to control where each runs, colocation constraints to keep groups together, and ordering constraints to control startup sequence.
What happens if I reboot one node? Will the HA cluster survive?
Yes. When a node reboots, Corosync and Pacemaker are normally set to auto-start at boot. The cluster will continue to run services on the surviving node.
Conclusion
At this point, you have learned to build and manage a functional HA cluster on dedicated servers by setting up a two-node cluster, creating floating virtual IPs, managing services as cluster resources, and testing failover safely.
We hope you enjoy this guide on HA Dedicated Server with Pacemaker and Corosync.
Subscribe to our X and Facebook channels to get the latest articles and updates.
For further reading: