Proxmox Cluster with Firewall VM

By Edward, Published January 14 2024, Updated March 1 2024

Overview

Setting up Proxmox into a cluster is easy when your firewall appliance sits independently of Proxmox. However, in my setup, Proxmox hosts my firewall within a VM. Adding the Proxmox server (or node) into a cluster results in a glaring issue.

Proxmox sets up unique VM IDs that cannot conflict or else it causes major issues. For that reason, when joining a cluster, the joining node has to have no VMs set up.

Alongside, Proxmox also has a synchronization feature called Corosync that keeps VM information up to date across nodes. If that service fails to synchronize with a majority of the cluster nodes, the node locks down and prevents any modifications to prevent adding or modifying a VM in a way that will violate the synchronization policies. This majority statement check is referred to as the quorum.


Lack of Troubleshooting

Joining a cluster may throw a few errors out, but within a few minutes, it should start working. If it does not, It would be a good time to start researching the symptoms. Unfortunately I ran into so many errors just trying to join the cluster, I had to nuke the joiner several times and make the creator forget the joiner ever existed. To create a guide on all of those symptoms would be unviable. Therefore, I will simply focus on joining a cluster successfully and setting up the quorum to allow the firewall VM to boot.


Before Adding to Cluster

Ensure that there’s a VPN policy allowing communication between the two cluster nodes. Once complete, we will set up the firewall VM options and bypass script before proceeding to adding the node to the cluster. This will help avoid any potential issues with booting up the hypervisor.

Next, open up


Bypassing the Quorum

In a setup where the Proxmox node is to be added to a cluster and is hosting the main WAN firewall as a VM, the quorum needs to be temporarily bypassed. To set up the bypass, we first want to make sure that the firewall is set up to start on node boot. Select the VM, then go to Options and select Start at boot. Check the box. Next, select Start/Shutdown order, and set to 1.

Next, connect to your node via SSH. Login with the default root credentials and create a file within root’s home directory. In this example, the file will be called quorumToOne.sh. Copy and paste the code in this section

Run the below command to make the file executable

chmod ug+x /root/quorumToOne.sh
/root/quorumToOne.sh
#!/bin/bash

# Sets quorum count to one temporarily
pvecm expected 1

# Waits 30 seconds before killing cluster sync service
# This allows the VMs to start booting
sleep 30
systemctl stop corosync

# Waits an additional 60 seconds to restart quorum sync service
sleep 60
systemctl start corosync
Bash

Setting Up the Quorum Service

Next, we need to set up the quorum program as a service. To do so, simply create and edit a new file within /etc/systemd/system. In this example, we will create /etc/systemd/system/quorumToOne.service.

Copy the code in this section to the new file. Make sure to include all the wants and afters to avoid setting the quorum too early, before Corosync starts, which would negate the effect of the script.

After the script is set up, run the following commands

systemctl daemon-reload
systemctl status quorumToOne.service

If all is well there, go ahead and enable the service on boot.

systemctl enable quorumToOne.service
/etc/systemd/system/quorumToOne.service
[Unit]
Description=Set Quorum to one for boot
Wants=multi-user.target
Wants=pvestatd.service
Wants=pveproxy.service
Wants=spiceproxy.service
Wants=pve-firewall.service
Wants=lxc.service
After=pveproxy.service
After=pvestatd.service
After=spiceproxy.service
After=pve-firewall.service
After=lxc.service
After=pve-ha-crm.service pve-ha-lrm.service

[Service]
User=root
Type=simple
ExecStart="/bin/bash" "/root/quorumToOne.sh"

[Install]
WantedBy=multi-user.target
Bash

Adding to cluster

Adding to the cluster should be fairly straightforward, but I have run into a bunch of error codes trying to set up the cluster. To keep things simple, we’ll call node A the cluster creator and node B the cluster joiner.

Node A Setup

Run the following command, just for good luck.

pvecm updatecerts --force

That should prevent any further issues from arising during the join.

On node A, select Datacenter > Cluster > Create Cluster. Accept the defaults and create the new cluster. Out of the create cluster window, select Join Information. Select Copy Information.

Node B Setup

n the following command, just for good luck.

pvecm updatecerts --force

That should prevent any further issues from arising during the join.

On node A, select Datacenter > Cluster > Join Cluster. Paste the cluster information in. If unable to copy-paste, type in the cluster information manually. Finally, click join. With any luck, it should go through just fine.


Final Notes

Before deploying either system into production, it is highly advisable to test the stability of the system. Attempt to reboot node A and verify the firewall VM boots correctly. If it does not, check system logs and see if quorumToOne.service fails or if the VM is set to start on boot.