2 Node Cluster

Overview

In this scenario, we are going to set up two vPlus servers in High Availability, Active/Passive mode. This is possible by using techniques such as a pacemaker, corosync, and DRBD. At least a basic understanding of these is highly desirable. This how-to is intended for RPM-based systems such as Red Hat / CentOS. If you run vPlus on a different OS, you may need to refer to your distribution docs.

Our environment is built of the following elements:

  1. vplus1 - first vPlus server + vPlus node, IP: 10.40.1.50

  2. vplus2 - second vPlus server + vPlus node, IP: 10.40.1.52

  3. Cluster IP: 10.40.1.100 - We will use this IP to connect to our active vPlus service. This IP will float between our servers and will point to an active instance.

  4. DRBD (optionally with VDO) for data replication and deduplication between nodes.

  5. MariaDB master <-> master replication

![](<../.gitbook/assets/overview-high_availability (1) (1) (1) (1) (1) (1) (1).png)

HA cluster setup

Preparing the environment

  • Stop and disable the vPlus server, node and database as the cluster will manage these resources.

systemctl disable vprotect-server vprotect-node mariadb
  • Use yum to check if you have any updates pending

yum update
  • It is a good idea to check /etc/hosts, especially if you installed vPlus using the All in one quick installation method, as you might find an entry such as:

    127.0.0.1 <your_hostname_here>

    Delete it as this prevents the cluster from functioning properly (your nodes will not "see" each other).

Now we can proceed with installation of the required packages.

  • On both servers run

yum install -y pacemaker pcs psmisc policycoreutils-python
  • Add a firewall rule to allow HA traffic - TCP ports 2224, 3121, and 21064, and UDP port 5405 (both servers)

firewall-cmd --permanent --add-service=high-availability
# success
firewall-cmd --reload
# success

While testing, depending on your environment, you may encounter problems related to network traffic, permissions, etc. While it might be a good idea to temporarily disable the firewall and SELinux, we do not recommend disabling that mechanism in the production environment as it creates significant security issues. If you choose to disable the firewall, bear in mind that vPlus will no longer be available on ports 80/443. Instead, connect to ports 8080/8181 respectively.

setenforce 0
sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
systemctl mask firewalld.service
systemctl stop firewalld.service
iptables --flush

Enable and start PCS daemon

systemctl enable pcsd.service
systemctl start pcsd.service

Cluster configuration

Earlier installation of a pcs package automatically creates a user hacluster with no password authentication. While this may be good for running locally, we will require a password for this account to perform the rest of the configuration, so let's

  • configure the same password on both nodes

passwd hacluster
#Changing password for user hacluster.
#New password:
#Retype new password:
#passwd: all authentication tokens updated successfully.

Corosync configuration

  • On node 1, issue a command to authenticate as a hacluster user:

pcs cluster auth vprotect1 vprotect2
#Username: hacluster
#Password:
#vprotect1: Authorized
#vprotect2: Authorized
  • Generate and synchronize the corosync configuration

pcs cluster setup --name mycluster vprotect1 vprotect2

​ Take a look at your output, which should look similar to below:

Destroying cluster on nodes: vprotect1, vprotect2...
vprotect1: Stopping Cluster (pacemaker)...
vprotect2: Stopping Cluster (pacemaker)...
vprotect1: Successfully destroyed cluster
vprotect2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vprotect1', 'vprotect2'
vprotect1: successful distribution of the file 'pacemaker_remote authkey'
vprotect2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vprotect1: Succeeded
vprotect2: Succeeded

Synchronizing pcsd certificates on nodes vprotect1, vprotect2...
vprotect1: Success
vprotect2: Success
Restarting pcsd on the nodes in order to reload the certificates...
vprotect1: Success
vprotect2: Success
  • Enable and start your new cluster

pcs cluster start --all && pcs cluster enable --all
#vprotect1: Starting Cluster (corosync)...
#vprotect2: Starting Cluster (corosync)...
#vprotect1: Starting Cluster (pacemaker)...
#vprotect2: Starting Cluster (pacemaker)...
#vprotect1: Cluster Enabled
#vprotect2: Cluster Enabled

OK! We have our cluster enabled. We have not created any resources (such as a floating IP) yet, but before we proceed we still have a few settings to modify.

Because we are using only two nodes, we need to

  • disable default quorum policy

(this command should not return any output)

pcs property set no-quorum-policy=ignore

We should also

  • define default failure settings

pcs resource defaults failure-timeout=30s
pcs resource defaults migration-threshold=3

These two settings combined will define how many failures can occur for a node to be marked as ineligible for hosting a resource and after what time this restriction will be lifted. We define the defaults here, but it may be a good idea to also set these values at the resource level, depending on your experience.

As long we are not using any fencing device in our environment (and here we are not) we need to:

  • disable stonith

pcs property set stonith-enabled=false && crm_verify -L

The second part of this command verifies running-config. These commands normally do not return any output.

Resource creation

Finally, we have our cluster configured, so it's time to proceed to

  • resource creation

First, we will create a resource that represents our floating IP 10.40.1.100. Adjust your IP and cidr_netmask, and you're good to go.

IMPORTANT: From this moment on we need to use this IP when connecting to our vProtect server.

pcs resource create "Failover_IP" ocf:heartbeat:IPaddr2 ip=10.40.1.100 cidr_netmask=22 op monitor interval=30s

Immediately, we should see our IP is up and running on one of the nodes (most likely on the one we issued this command for).

ip a
#[..]
#2: ens160:  mtu 1500 qdisc mq state UP group default qlen 1000
#    link/ether 00:50:56:a6:9f:c6 brd ff:ff:ff:ff:ff:ff
#    inet 10.40.1.50/22 brd 10.40.3.255 scope global ens160
#       valid_lft forever preferred_lft forever
#    inet 10.40.1.100/22 brd 10.40.3.255 scope global secondary ens160
#       valid_lft forever preferred_lft forever
#    inet6 fe80::250:56ff:fea6:9fc6/64 scope link
#       valid_lft forever preferred_lft forever

As you can see, our floating IP 10.40.1.100 has been successfully assigned as the second IP of interface ens160. This is what we wanted!

We should also check if the vPlus web interface is up and running. We can do this by opening the web browser and typing in https://10.40.1.100.

The next step is to

  • define a resource responsible for monitoring network connectivity

pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=10.40.0.1 clone
pcs constraint location Failover_IP rule score=-INFINITY pingd lt 1 or not_defined pingd

Note that you need to use your gateway IP in the host_list parameter

Finally, we have to define a set of cluster resources responsible for other services crucial for vPlus as vPlus Node and the vPlus server itself. We will logically link these services with our floating IP. Whenever the floating IP disappears from our server, these services will be stopped. We also have to define the proper order for services to start and stop, as for example starting the vPlus-server without a running database makes little sense.

  • Resource creation

pcs resource create "vProtect-node" systemd:vprotect-node op monitor timeout=300s on-fail="stop" --group vProtect-group
pcs resource create "vProtect-server" service:vprotect-server op start on-fail="stop" timeout="300s" op stop timeout="300s" on-fail="stop" op monitor timeout="300s" on-fail="stop" --group vProtect-group

It is OK for these commands not to return any output.

  • Resource colocation

pcs constraint colocation add Failover_IP with vProtect-group

To finish with, we can set which server is more preferred for running our services

  • Set node preference

pcs constraint location Failover_IP prefers vprotect1=INFINITY
pcs constraint location vProtect-group prefers vprotect1=INFINITY

We have made it to the end. At this point, our pacemaker HA cluster is functional.

However, there are still two things we need to consider, that is:

  1. Creating DB replication

  2. Setting up DRBD for /vprotect_data (optionally with VDO)

Setting up VDO+DRBD

In this section, we will prepare our deduplicated and replicated filesystem mounted in /vprotect_data.

Using a deduplicated FS is optional but highly recommended. If you don't intend to use it, skip the part regarding VDO configuration.

Note: If you are altering existing Stoware Backup & Recovery configuration it is very important to preserve the /vprotect_data contents and transfer them to the new filesystem. You may also need to re-create your backup_destination if you previously had one in this directory. Setting up VDO and DRBD will cause all data to be wiped from the configured volume.

Installation is split into the steps below that you need to follow to get the job done.

  • Stop the vPlus server and node

systemctl stop vprotect-server vprotect-node

No output means everything went OK.

  • On both nodes install the equired repositories and packages

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
#Retrieving https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
#Preparing...                          ################################# [100%]
#Updating / installing...
#   1:elrepo-release-7.0-4.el7.elrepo  ################################# [100%]

The next command can produce quite a few lines, so I've truncated the output, however the idea is simple: install drbd packages:

yum install -y kmod-drbd84 drbd84-utils

#Installed:
#drbd84-utils.x86_64 0:9.6.0-1.el7.elrepo                                               kmod-drbd84.x86_64 0:8.4.11-1.1.el7_6.elrepo

If you have not disabled SELinux and the firewall, remember to

  • configure them on both nodes

    semanage permissive -a drbd_t
    firewall-cmd --add-port=7788/tcp --permanent
    #success
    firewall-cmd --complete-reload
    #success

    Don't forget to repeat these steps on the second node

Now that we have the necessary software installed, we must prepare an identical size block device on both nodes. A block device can be a hard drive, a hard drive partition, software RAID, LVM Volume, etc. In this scenario, we are going to use a hard drive connected as /dev/sdb.

To add a DRBD resource we create the file /etc/drbd.d/vprotect.res with the content below. Be sure to change the "address" so that t reflects your network configuration.

Also, the node names (vplus1 and vplus2) must match your uname -n output.

resource replicate {
protocol C;
    on vprotect1 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 10.40.1.50:7788;
                meta-disk internal;
        }
    on vprotect2 {
                device /dev/drbd0;
                disk /dev/sdb;
                address 10.40.1.52:7788;
                meta-disk internal;
        }

We now have config in place and can create and bring our resource online.

  • On both nodes, run

    drbdadm create-md replicate
    #initializing activity log
    #initializing bitmap (4800 KB) to all zero
    #Writing meta data...
    #New drbd meta data block successfully created.

    then bring the volume online

    drbdadm up replicate

    You can verify if the device is up & running by issuing

    lsblk
    #NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    #sda                       8:0    0   16G  0 disk
    #├─sda1                    8:1    0    1G  0 part /boot
    #└─sda2                    8:2    0   15G  0 part
    #├─vg_vprotect-lv_root 253:0    0 13.4G  0 lvm  /
    #└─vg_vprotect-lv_swap 253:1    0  1.6G  0 lvm  [SWAP]
    #sdb                       8:16   0  150G  0 disk
    #└─drbd0                 147:0    0  150G  1 disk

    However, if we check

    drbdsetup status replicate
    #replicate role:Secondary
    #disk:Inconsistent
    #peer role:Secondary
    #replication:Established peer-disk:Inconsistent

    we will notice we need to start synchronization before we can use our volume.

  • On the first server, run

    drbdadm primary --force replicate
    drbdsetup status replicate
    #replicate role:Primary
    #disk:UpToDate
    #peer role:Secondary
    #replication:SyncSource peer-disk:Inconsistent done:0.22

    This way we have successfully started the process of replication between servers with vprotect1 as the ynchronization source.

    If you don't want to create a VDO device, then create and mount your filesystem:

    mkfs.xfs -K /dev/drbd0
    mount /dev/mapper/drbd0 /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data
  • Create VDO volume (optional)

    By issuing the command below we will create a VDO volume called vdo_data and put in at the top our DRBD volume. Afterwards, we format it with XFS and mount it in /vprotect_data.

    vdo create --name=vdo_data --device=/dev/drbd0 --vdoLogicalSize=400G --compression=enabled --deduplication=enabled
    #Creating VDO vdo_data
    #Starting VDO vdo_data
    #Starting compression on VDO vdo_data
    #VDO instance 0 volume is ready at /dev/mapper/vdo_data
    
    mkfs.xfs -K /dev/mapper/vdo_data
    meta-data=/dev/mapper/vdo_data   isize=512    agcount=4, agsize=26214400 blks
        =                       sectsz=4096  attr=2, projid32bit=1
        =                       crc=1        finobt=0, sparse=0
    data     =                       bsize=4096   blocks=104857600, imaxpct=25
        =                       sunit=0      swidth=0 blks
    naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
    log      =internal log           bsize=4096   blocks=51200, version=2
        =                       sectsz=4096  sunit=1 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0
    
    mount /dev/mapper/vdo_data /vprotect_data/ && chown -R vprotect:vprotect /vprotect_data
  • Copy the VDO config to the second node

scp /etc/vdoconf.yml root@vprotect2:/etc/vdoconf.yml
  • Disable VDO automatic startup

    As this resource will be managed by the cluster, we need to disable auto startup of this service on both nodes.

    systemctl disable vdo

Final cluster settings

At this point, we have three components set up. To fully utilize our HAcluster and eliminate the need for manual intervention we should add the resources and settings below to our cluster.

Issue these commands on one node only as it will propagate to the cluster settings.

pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create replicate ocf:linbit:drbd \
drbd_resource=replicate op monitor interval=10s --group fs_group

pcs -f drbd_cfg resource master replicateClone replicate \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
notify=true --group fs_group

pcs -f drbd_cfg resource create vdo_resource ocf:heartbeat:vdo-vol volume=vdo_data --group fs_group
pcs -f drbd_cfg resource create fs_resource ocf:heartbeat:Filesystem device=/dev/mapper/vdo_data directory=/vprotect_data fstype=xfs  --group fs_group
pcs cluster cib-push drbd_cfg --config

pcs constraint colocation add vdo_resource with replicateClone
pcs constraint order start vdo_resource then fs_resource
pcs constraint order start replicateClone then vdo_resource
pcs constraint colocation add vProtect-group with fs_group
pcs constraint colocation add vdo_resource with replicateClone INFINITY with-rsc-role=Master
pcs constraint order promote replicateClone then start fs_group

Here we have created a temporary file drbd_cfg and inside this file we have added our drbd*resource called *replicate_, plus a Master/Slave set for this resource.

Afterwards, we have the definition of the vdo_resource and fs_resource in one fs_group followed by an update of the cluster configuration.

As a second step, we have put in place several resource colocations and constraints which allow us to control the order and existence of newly created resources.

We need still to

  • Make sure that our node is pointed to a localhost address. Check the Nodes UI section.

If the node's IP is different than 127.0.0.1, delete the node and re-register it using

vprotect node -e <Node_Name> admin http://127.0.0.1:8080/api
  • copy our license and node information from the first node to the second node:

scp -pr /opt/vprotect/.session.properties
scp -pr /opt/vprotect/license.key

MariaDB replication

In this section, we will cover how to setup master<->master MariaDB replication.

  • On both nodes, if you have the firewall enabled, allow communication via port 3306

firewall-cmd --add-port=3306/tcp --permanent
firewall-cmd --complete-reload

Steps to run on the first vplus1 node: 10.40.1.50

This server will be the source of DB replication.

  • Stop the vPlus server, node and database

systemctl stop vprotect-server vprotect-node mariadb
  • Edit the config file, enable binary logging and start MariaDB again. Depending on your distribution, the config file location may vary, most likely it is /etc/my.cnf or /etc/my.cnf.d/server.cnf

    In the [mysqld] section, add the lines:

vi /etc/my.cnf.d/server.cnf

#Add the following lines:
log-bin
server_id=1
replicate-do-db=vprotect

systemctl start mariadb
  • Now log in into your MariaDB, create a user used for replication and assign appropriate rights to it.

    For the purpose of this task, we will set the username to 'replicator' and the password to 'R3pLic4ti0N'

mysql -u root -p
#Enter password:
#[..]
#MariaDB [(none)]> create user 'replicator'@'%' identified by 'R3pLic4ti0N';
#Query OK, 0 rows affected (0.026 sec)

#MariaDB [(none)]> grant replication slave on *.* to 'replicator'@'%';
#Query OK, 0 rows affected (0.001 sec)

#MariaDB [(none)]> FLUSH PRIVILEGES;
#Query OK, 0 rows affected (0.001 sec)

Don't log out just yet, we need to check the master status and

  • write down the log file name and position, as it is required for proper slave configuration.

MariaDB [(none)]> show master status;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect1-bin.000007 |    46109 |              |                  |
+----------------------+----------+--------------+------------------+
  • Dump the vprotect database and copy it onto the second server (vprotect2).

mysqldump -u root -p vprotect > /tmp/vprotect.sql
scp /tmp/vprotect_rep.sql root@vprotect2:/tmp/

Steps to run on the 2nd server, vplus2: 10.40.1.52

For the reader's convenience, I have only highlighted the differences in configuration between vplus1 and vplus2, and omitted the output of some commands if they are the same as on the previous node.

  • Stop the vprotect server, node and database

  • Edit the MariaDB config file. Assign a different server id, for example: 2. Then start MariaDB.

vi /etc/my.cnf.d/server.cnf

#Add the following lines:
log-bin
server_id=2
replicate-do-db=vprotect

systemctl start mariadb
  • Load the database dump copied from vplus1.

mysql -u root -p vprotect < /tmp/vprotect.sql

At this point, we have two identical databases on our two servers.

  • Log in to the MariaDB instance, create a replication user with a password. Use the same user as on vplus1. Grant the necessary permissions.

  • Set the master host. You must use the user_master_log_file and master_log_pos written down earlier. Change the IP of the master host to match your network configuration.

MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST = '10.40.10.50', MASTER_USER = 'replicator',MASTER_PASSWORD='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect1-bin.000007',MASTER_LOG_POS=46109;
Query OK, 0 rows affected (0.004 sec)
  • Start the slave, check the master status and write down the file name and position.

MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> SHOW MASTER STATUS;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| vprotect2-bin.000002 |   501051 |              |                  |
+----------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

Go back to the first server (vplus1)*

  • On storreaw1, stop the slave then change the master host using the parameters noted down in the previous step. Also, change the master host IP to match your network configuration.

MariaDB [(none)]> stop slave;
MariaDB [(none)]> MariaDB [(none)]>  change master to master_host='10.40.1.52', master_user='replicator', master_password='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect2-bin.000002', master_log_pos=501051;
Query OK, 0 rows affected (0.004 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

At this point, you have successfully configured MariaDB master<->master replication.

Testing the setup

Automatic

The fastest way to test our setup is to invoke

pcs node standby vprotect1

to put vplus1 into standby mode, which prevents it from hosting any cluster resources.

After a while, you should see your resources up and running on vplus2.

Note that if you perform normal OS shutdown (not a forced one), the pacemaker will wait for a long time for a node to come back online, which in fact will prevent completion of shutdown. As a result, resources will not switch correctly to the other node.

Manual

If you want to dive a little bit deeper, we have prepared instructions on how to manually move a filesystem resource from the first node to the second.

  1. Stop vprotect services.

    systemctl stop vprotect-server && systemctl stop vprotect-node
  2. Unmount the FS used by DRBD/VDO on the primary server (here vplus1).

    drbdadm role replicate
    #Primary/Secondary
    umount /vprotect_data/
  3. If you are using a VDO device, stop it.

    vdo stop -n vdo_data
    #Stopping VDO vdo_data
  4. Demote the primary replication server (still vplus1) to secondary server.

    drbdadm secondary replicate

On the second server

  1. Promote the second server (here vplus2) to the primary DRBD role.

    drbdadm primary replicate
  2. Start the VDO.

    vdo start -n vdo_data
    #Starting VDO vdo_data
    #Starting compression on VDO vdo_data
    #VDO instance 2 volume is ready at /dev/mapper/vdo_data
  3. Mount the filesystem on the second server.

    mount /dev/mapper/vdo_data /vprotect_data/

Now you have your replicated volume mounted on the second node.