NFS Failover with DRBD and Heartbeat on AWS EC2

Setup NFS Failover with DRBD and Heartbeat on AWS EC2

This tutorial will help to configure NFS Failover using DRBD and Heartbeat on AWS EC2. There are multiple tutorials on implementing the above, but not on AWS EC2.

NOTE: Run the commands/scripts on BOTH SERVERS (Primary and Secondary), otherwise mentioned explicitly.

Requirements

  • VPC with a public subnet and Internet gateway.
  • IAM Role (DescribeInstances and AssignPrivateAddress) for EC2
  • 2 x Ubuntu Instances (for NFS Primary and Secondary)
  • 1 x Ubuntu Instance (for NFS Client; testing)
  • 3 Elastic IPs (for Primary, Secondary and Virtual IP)

Create a VPC

Follow this guide: HERE

Create a Security Group

Follow this guide: HERE

Create an Amazon EC2 IAM Role with the following policy

1
2
3
4
5
6
7
8
9
10
{
"Statement": [{
"Action": [
"ec2:AssignPrivateIpAddresses",
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": "*"
}]
}

Launch Two Ubuntu EC2 instances into Your VPC’s Public Subnet

  1. Assign EC2 IAM Role created above to the instance (do not skip this or forget)
  2. Assign Private IPs (Primary: 192.168.0.10, Secondary: 192.168.0.11, Virtual IP: 192.168.0.12)
  3. Assign the Security Group created above.
  4. Configure Elastic IP Addresses for Your Instances.

Create the vipup script

Make changes according to the server (don’t just copy paste like a kid)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# This script will monitor another HA node and take over a Virtual IP (VIP)
# if communication with the other node fails
HA_Node_IP=192.168.0.11
VIP=192.168.0.12
REGION=us-east-1
Instance_ID=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id)
ENI_ID=$(ec2-describe-instances "$Instance_ID" --region "$REGION" | grep eni -m 1 | awk '{print $2;}')

pingresult=$(ping -c3 -W 1 $HA_Node_IP | grep time= | wc -l)
if [ "$pingresult" == "0" ]; then
echo "$(date) -- HA heartbeat failed, taking over VIP"
ec2-assign-private-ip-addresses -n "$ENI_ID" \
--secondary-private-ip-address "$VIP" \
--allow-reassignment --region "$REGION"
ifconfig eth0:0 "$VIP" netmask 255.255.255.0 > /dev/null 2>&1
fi

Mark it executable:

1
$ chmod +x vipup


Additional Steps to configure on servers:

  1. Add Route Table for virtual IP:

    1
    $ echo "2 eth1_rt" >> /etc/iproute2/rt_tables ;
  2. Add eth0:0 interface and copy the below configuration so that it is available on AWS internal network:

    1
    2
    3
    4
    5
    6
    $ vim /etc/network/interfaces.d/eth0:0.cfg

    auto eth0:0
    iface eth0:0 inet dhcp
    up ip route add default via 192.168.0.1 dev eth0:0 table eth1_rt
    up ip rule add from 192.168.0.12 lookup eth1_rt prio 1000
  3. Edit hosts file to enter details about Primary and Secondary Servers

    1
    2
    3
    4
    $ vim /etc/hosts

    192.168.0.10 primary.nfs.server
    192.168.0.11 secondary.nfs.server
  4. Edit the hostname to reflect on the server:

    1
    2
    3
    4
    $ vim /etc/hostname

    primary.nfs.server (on Primary Server)
    secondary.nfs.server (on Secondary Server)
  5. Set the hostname (and logout, login)

    1
    $ hostname -F /etc/hostname
  6. Update the repositories and install aws tools:

    1
    2
    3
    $ apt-add-repository ppa:awstools-dev/awstools
    $ apt-get update
    $ apt-get install ntpdate tzdata ec2-api-tools ec2-ami-tools iamcli rdscli moncli ascli elasticache awscli
  7. Test:

    1
    $ ec2-describe-instances

    or

    1
    $ aws ec2 describe-instances
  8. Update the time for proper synchronization (very important):

    1
    $ ntpdate -u in.pool.ntp.org

Install DRBD:

Update the repositories and Install the DRBD software (reboot required)

1
2
3
4
5
$ apt-get update
$ apt-get install drbd8-utils
$ apt-get install linux-image-extra-virtual
$ depmod -a
$ sleep 5; reboot

Setup DRBD for a particular device (Example: /dev/xvdb):

  1. To configure drbd, edit /etc/drbd.conf and change global { usage-count yes; } to no (Ignore if already changed)
  2. Create a resource file r0.res in /etc/drbd.d/

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    $ vim /etc/drbd.d/r0.res (edit as required)

    resource r0 {
    net {
    #on-congestion pull-ahead;
    #congestion-fill 1G;
    #congestion-extents 3000;
    #sndbuf-size 1024k;
    sndbuf-size 0;
    max-buffers 8000;
    max-epoch-size 8000;
    }
    disk {
    #no-disk-barrier;
    #no-disk-flushes;
    no-md-flushes;
    }
    syncer {
    c-plan-ahead 20;
    c-fill-target 50k;
    c-min-rate 10M;
    al-extents 3833;
    rate 100M;
    use-rle;
    }
    startup { become-primary-on master ; }
    protocol C;

    on master {
    device /dev/drbd0;
    disk /dev/xvdb;
    meta-disk internal;
    address 192.168.0.10:7801;
    }

    on slave {
    device /dev/drbd0;
    disk /dev/xvdb;
    meta-disk internal;
    address 192.168.0.11:7801;
    }
    }
  3. The file r0.res should be same on both the servers.

  4. Now using the drbdadm utility initialize the meta data storage. On each server execute:

    1
    $ drbdadm create-md r0
  5. Next, on both hosts, start the drbd daemon:

    1
    $ /etc/init.d/drbd start
  6. On Primary Server, run:

    1
    $ drbdadm -- --overwrite-data-of-peer primary all
  7. After executing the above command, the data will start syncing with the Secondary server. To watch the progress, on Secondary Server enter the following:

    1
    $ watch -n1 cat /proc/drbd
  8. Finally, add a filesystem to /dev/drbd0 and mount it:

    1
    2
    3
    $ mkfs.ext4 /dev/drbd0
    $ mkdir /drbd
    $ mount /dev/drbd0 /drbd

Install Heartbeat (for node failure detection)

  1. Install heartbeat:

    1
    $ apt-get install heartbeat
  2. Edit the ha.cf file under /etc/ha.d/:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    $ vim ha.cf

    # Give cluster 30 seconds to start
    initdead 60
    # Keep alive packets every 1 second
    keepalive 1
    # Misc settings
    traditional_compression off
    deadtime 60
    deadping 60
    warntime 5
    # Nodes in cluster
    node primary.nfs.server secondary.nfs.server
    # Use logd, configure /etc/logd.cf
    use_logd on
    # Don't move service back to preferred host when it comes up
    auto_failback off
    # Takover if pings (above) fail
    respawn hacluster /usr/lib/heartbeat/ipfail
    ##### Use unicast instead of default multicast so firewall rules are easier
    # primary
    ucast eth0 192.168.0.10
    # secondary
    ucast eth0 192.168.0.11
    bcast eth0
  3. Edit haresources file for heartbeat to use:

    1
    2
    3
    $ vim haresources

    primary.nfs.server drbddisk::r0 Filesystem::/dev/drbd0::/drbd::ext4 vipup nfs-kernel-server
  4. Edit the authkeys file (for authentication between nodes):

    1
    2
    3
    4
    5
    $ vim authkeys

    # Automatically generated authkeys file
    auth 1
    1 sha1 1a8c3f11ca9e56497a1387c40ea95ce1

    or, generate the file from below command:

    1
    2
    3
    4
    5
    cat <<EOF > /etc/ha.d/authkeys
    # Automatically generated authkeys file
    auth 1
    1 sha1 `dd if=/dev/urandom count=4 2>/dev/null | md5sum | cut -c1-32`
    EOF
  5. Enable logging for Heartbeat:

    1
    2
    3
    4
    5
    $ vim /etc/logd.cf

    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    syslogprefix linux-ha
  6. Create softlink of nfs-kernel-server in /etc/ha.d/resource.d/ folder:

    1
    $ ln -s /etc/init.d/nfs-kernel-server /etc/ha.d/resource.d/
  7. Add fstab entries for auto mounting device using heartbeat:

    1
    2
    # DRBD, mounted by heartbeat
    /dev/drbd0 /drbd ext4 noatime,noauto,nobarrier 0 0

Configure NFS Exports

  1. Edit the exports file:

    1
    2
    3
    $ vim /etc/exports

    /drbd 192.168.0.0/24(rw,async,no_subtree_check,fsid=0)
  2. Export the configuration to the network:

    1
    $ exportfs -a

NFS Client:

  1. Install NFS packages on client so that exports can be mounted:

    1
    $ apt-get install nfs-common
  2. Check for mounts available:

    1
    $ showmount -e 192.168.0.12
  3. Mount the shared folder on the server:

    1
    2
    3
    $ mkdir /drbd
    $ mount -vvv 192.168.0.12:/drbd/ -o nfsvers=3,rsize=32768,wsize=32768,hard,timeo=50,bg,actimeo=3,noatime,nodiratime,intr /drbd/
    $ df -h
  4. Start copying data (for testing):

    1
    $ rsync -av --progress --append --bwlimit=10240 /drbd/A_BIG_FILE /tmp/

Testing NFS Fail-over:

  1. From another system, mount the NFS share from the cluster
  2. Use rsync –progress -av to start copying a large file (1-2 GB) to the share.
  3. When the progress is 20%-30%, stop heartbeat service on primary (or turn off/reboot instance)
  4. Rsync will lock up (as intended) due to NFS blocking.
  5. After 5-10 seconds, the file should continue transferring until finished with no errors.
  6. Do an md5 checksum comparison of the original file and the file on the NFS share.
  7. Both files should be identical, if not, there was corruption of some kind.
  8. Try the test again by reading from NFS, rather than writing to it

References:

amazon web services aws linux ubuntu ec2 amazon drbd