We want to take an opportunity to explain LINBIT’s best practices in regards to DRBD and backup procedures.
DRBD is designed as a storage solution to provide High Availability, Disaster Recovery and Cross Site High Availability to your systems. As developers of DRBD, we sometimes get community feedback that some folks are using DRBD as a “pseudo” backup solution, and in response to this we wanted to share some abstract guidelines on utilizing DRBD properly by following some key best practice methodologies.
Although DRBD is not backup software, it doesn’t mean you can’t use it in your backup procedures. Utilizing DRBD with LVM as a backing device, one can create backups with minimal to no interference to performance. This is done by utilizing LVM snapshotting as outlined in LINBIT’s DRBD User’s Guide. Although this page outlines how to do snapshots before and after a resync, these could easily be adapted to a cron job. Essentially one would disconnect the Secondary, snapshot the backing device, mount the snapshot, perform the backups, umount the snapshot, reconnect the Secondary. These point in time backups are great for technology such as iSCSI targets, Virtual Machine storage or Databases such as MySQL and PostgreSQL. As you can imagine, this methodology is quite popular in the Linux HA and DRBD communities.
LINBIT advises systems administrators to:
- Utilize DRBD for High Availability, Disaster Recovery and Cross Site High Availability (business continuity) purposes.
- Plan, review and execute a full backup strategy that makes sense for your organization and data. Be sure to keep in mind how much data you’re planning on storing, backing up and at what intervals. It is important to choose the point in time to make your backups to minimize things such as user error. In many cases, backing up every day is the appropriate strategy.
- Test, test, test. We cannot say this enough.We develop software that is designed to prevent loss from failure, so you could say we’re experts on this topic. It’s very important that you not only test DRBD’s configuration, but the components that make up your backup system as well. Then, on a scheduled basis, you should be reviewing your data to ensure its completeness and correctness. As well, on an annual basis it would be wise to review your top level strategy and make updates if your requirements have changed. In summation, it is advised to routinely test your backup procedure and also verify (checksum) your backups to ensure their completeness.
In closing, DRBD is designed to prevent loss of service as the result of equipment failure. LINBIT strongly advises systems administrators to implement a strategy that incorporates “point in time” backups so administrators can restore, rewind and rejoice knowing that they’re not only backed by the best open source replication technology: DRBD, but a comprehensive backup solution that is designed for the organization’s needs in mind.
How do you backup your DRBD cluster?
Share your thoughts or comments below!
I was trying to do backups using snapshot for some time, but it failed. I am using DRBD as a backing device for LVM which in turn contains our cluster data (XEN, pacemaker and so on).
One of quirks is that you need to do snapshot and backup on same node, also when doing backup it may happend that due to high troughput of your disks, pacemaker starts lagging and gets STONITHed.
So now I do backups inside every virtual machine. Not nice solution, but safe and working fine.
Please note: it is not DRBD fault, just describing my case here
.
With LVM on top of DRBD taking snapshots can only be done where DRBD is primary. However, if using LVM underneath DRBD (logical volume as DRBD’s backing device) it is possible to take the snapshot and backup from the secondary node. Thus you can make a backup without interfering or impeding on the performance of the primary node.
Agreed, backing up from the secondary via LVM under DRBD snapshots works a treat though it will still impact peformance of writes to the DRBD device and, if both of your nodes have Primary resource(s) you will still have a performance impact regardless.
It’s easy to mitigate this by prefixing your backup commands with:
nice -n 19 ionice -c 3
This will give your backup process the lowest CPU and I/O priority which will have a much lesser impact on performance while the backup is taking place especially against the pacemaker lag problem mentioned by the OP.
Hello,
for me was the best way (not the savest.. ) to break up the drbd synchonisation every night and backup the raw data inside.. i made a script for this autmation and its working fine
the only one is.. the time there’s no secondary for failover..
may it could help someone
/root/backup_raw_automated.sh
#!/bin/bash
# backup_raw_automated.sh
#
# gea@itandtel.at – 2011/2012
##### VARIABLES ###################################################################################
CUSTOMER=
HOST=`hostname |awk -F. ‘{ print $1 }’`
iface=eth2
resource=r0
drbddev=/dev/drbd1
mountpoint=/mysql_drbd
localip=192.168.255.11
remoteip=192.168.255.10
LOG=/var/log/backup/backup_database_raw_`date +%Y-%m-%d`.log
BKTARDESTFLD=/Backup
BKTARDESTFILE=$BKTARDESTFLD/backup_raw_mysql_`date +%Y.%m.%d`.tar.bz2
BKTARSOURCES=$mountpoint/mysql/
RESYNC=
#VARERRORS=0
#FFERRORS=0
LOCK=/tmp/backup_databases_raw.lock
ROLE=`drbdadm role r0 |awk -F/ ‘{ print $1 }’`
TIMEDIFFSEC=0
TIMEDIFFMIN=0
EMAIL1=
EMAIL2=
MAILSENDER=”`hostname -a |tr [:lower:] [:upper:]`”
ERRSUM=0
#devel exit
#exit
###### STARTUP ####################################################################################
echo “##################################################################################” >> $LOG
echo “`date +%H:%M:%S` + `date +%Y.%m.%d` – start backup up raw files” >> $LOG
echo “———————————————————————————-” >> $LOG
##### LOCK FILE ###################################################################################
[ -e $LOCK ] && echo “lock file exist! abort..” >> $LOG
[ -e $LOCK ] && echo “lock file exist! abort..”
[ -e $LOCK ] && exit
touch $LOCK
##### PRE CHECK + WHO I AM? PRIMARY/SECONDARY #####################################################
echo “`date +%H:%M:%S` – PRECHECK ON” >> $LOG
TIMESTART=`date +%s`
[ -e $BKTARDESTFLD ] || mkdir $BKTARDESTFLD
[ -e $BKTARDESTFLD ] || echo “$BKTARDESTFLD could not created.. abort..” >> $LOG
[ -e $BKTARDESTFLD ] || exit
[ -e $drbddev ] || echo “no $drbddev – abort..” >> $LOG
[ -e $drbddev ] || exit
[ -e $mountpoint ] || mkdir $mountpoint
[ -e $mountpoint ] || echo “$mountpoint could not created.. abort..” >> $LOG
[ -e $mountpoint ] || exit
echo -n -e “`date +%H:%M:%S` – drbd role check: ” >> $LOG
if [ $ROLE = "Primary" ];then
echo “primary => abort!” >> $LOG
rm $LOCK
exit
elif [ $ROLE = "Secondary" ];then
echo “secondary => ok” >> $LOG
else
echo “CONFUSION – i dont know which role we have! abort..” >> $LOG
exit
fi
[ -z $iface ] && echo “var iface is empty”
[ -z $iface ] && exit
[ -z $resource ] && echo “var resource is empty”
[ -z $resource ] && exit
[ -z $drbddev ] && echo “var drbddev is empty”
[ -z $drbddev ] && exit
[ -z $mountpoint ] && echo “var \$mountpoint is empty”
[ -z $mountpoint ] && exit
[ -z $localip ] && echo “var \$localip is empty”
[ -z $localip ] && exit
[ -z $remoteip ] && echo “var \$remoteip is empty”
[ -z $remoteip ] && exit
[ -z $LOG ] && echo “var \$LOG is empty”
[ -z $LOG ] && exit
[ -z $BKTARDESTFLD ] && echo “var \$BKTARDESTFLD is empty”
[ -z $BKTARDESTFLD ] && exit
[ -z $BKTARDESTFILE ] && echo “var \$BKTARDESTFILE is empty”
[ -z $BKTARDESTFILE ] && exit
[ -z $BKTARSOURCES ] && echo “var \$BKTARSOURCES is empty”
[ -z $BKTARSOURCES ] && exit
[ -z $LOCK ] && echo “var \$LOCK is empty”
[ -z $LOCK ] && exit
echo “`date +%H:%M:%S` – PRECHECK DONE – all right!” >> $LOG
###################################################################################################
### DELETE OLD BACKUPS ###
###################################################################################################
echo “`date +%H:%M:%S` – Clean up $BKTARDESTFLD” >> $LOG
find $BKTARDESTFLD -name ‘backup_raw*’ -ctime +1 >> $LOG ## -ctime +5 => last file status change older than 5x24h
find $BKTARDESTFLD -name ‘backup_raw*’ -ctime +1 -exec rm {} \; ## and remove them..
echo “`date +%H:%M:%S` – Clean up down” >> $LOG
###################################################################################################
### NOW BREAK UP SYNCHRONISATION ###
###################################################################################################
##### DISCONNECT IFACE ############################################################################
echo “`date +%H:%M:%S` – STARTING BACKUP” >> $LOG
echo “`date +%H:%M:%S` – >> shutting down interface $iface <> $LOG
ifconfig $iface down
# check iface is down
if [ `ethtool $iface |grep "Link detected" |awk '{ print $3 }'` = yes ];then
echo ” ABORT – Interface $iface is still up! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
elif [ `ethtool $iface |grep "Link detected" |awk '{ print $3 }'` = no ];then
echo ” OK – Interface $iface done” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo ” CONFUSION – Interface $iface – failure! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
fi
##### DRBD ROLE TO PRIMARY ########################################################################
echo “`date +%H:%M:%S` – >> drbd role to primary <> $LOG
drbdadm primary $resource
sleep 2
# check drbd role
if [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Primary" ];then
echo ” OK – drbd state of resource $resource is now primary” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
elif [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Secondary" ];then
echo ” ABORT – drbd state of resource $resource is still secondary! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
else
echo ” CONFUSION – drbd state of resource $resource … abort” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
###### MOUNT DRBD DEVICE ##########################################################################
echo “`date +%H:%M:%S` – >> mounting drbd device <> $LOG
mount $drbddev $mountpoint
if [ $? = 0 ];then
echo ” OK – mounting of $drbddev to $mountpoint done” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
elif [ $? = 1 ];then
echo ” ABORT – mounting of $drbddev to $mountpoint failed! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
else
echo ” CONFUSION – i have no plan whats going on.. abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
##### TAR OF MYSQL RAW FILES ######################################################################
echo “`date +%H:%M:%S` – >> starting tar backup <> $LOG
tar -cjPf $BKTARDESTFILE $BKTARSOURCES
if [ $? = 0 ];then
echo ” OK (`date +%H:%M:%S`) – tar backup from $BKTARSOURCES to $BKTARDESTFILE finished” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
elif [ $? = 0 ];then
echo ” ABORT (`date +%H:%M:%S`) – tar backup has problems! $BKTARSOURCES to $BKTARDESTFILE failed” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo ” CONFUSION (`date +%H:%M:%S`) – i have no plan what happend! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
###################################################################################################
### NOW BACK TO SYNCHRONISATION ###
###################################################################################################
echo “`date +%H:%M:%S` – BACK TO SYNC” >> $LOG
##### UNMOUNTING DRBD DEVICE ######################################################################
echo “`date +%H:%M:%S` – >> unmount $mountpoint <> $LOG
mount |grep $mountpoint > /dev/null
if [ $? = 0 ];then
umount $mountpoint
if [ $? = 0 ];then
echo ” OK – unmounting done” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo “ABORT – unmounting failed!! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
else
echo “$mountpoint is not mounted.. but.. forward..” >> $LOG
fi
##### HIGH CRITICAL – DRBD ROLE SECONDARY ########################################################
echo “`date +%H:%M:%S` – >> drbd role to secondary <> $LOG
if [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Primary" ];then
echo ” OK – drbd role is currently `drbdadm role $resource |awk -F/ ‘{ print $1 }’` – now set to Secondary” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
drbdadm disconnect $resource
drbdadm secondary $resource
if [ $? = 0 ];then
sleep 1
if [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Secondary" ];then >> $LOG
echo ” OK – drbd role is now `drbdadm role $resource |awk -F/ ‘{ print $1 }’`” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
RESYNC=1
else
echo ” ABORT – drbd role is not secondary – role: `drbdadm role $resource |awk -F/ ‘{ print $1 }’` !! abort..” >> $LOG
RESYNC=0
ERRSUM=`expr $ERRSUM + $?`
exit
fi
else
echo ” FAILED – drbd role to secondary has errors.. abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
else
echo ” WARN – drbd role is not primary!! but.. forward..” >> $LOG
echo “role: `drbdadm role $resource |awk -F/ ‘{ print $1 }’`”
RESYNC=1
ERRSUM=`expr $ERRSUM + $?`
fi
##### CRITICAL – CONNECT IFACE ####################################################################
echo “`date +%H:%M:%S` – >> connect iface <> $LOG
link=`ethtool $iface |grep “Link detected” |awk ‘{ print $3 }’`
if [ $link = no ];then
ifconfig $iface up
if [ `ethtool $iface |grep "Link detected" |awk '{ print $3 }'` = yes ];then
echo ” OK – link is now connected!” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
# ping localip
ping -c 1 $localip > /dev/null
if [ $? = 0 ];then
echo ” OK – ping $localip success” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo ” FAILED – ping $localip failed!” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
fi
# ping remoteip
ping -c 1 $remoteip > /dev/null
if [ $? = 0 ];then
echo ” OK – ping $remoteip success” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo ” FAILED – ping $remoteip failed!” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
fi
else
echo ” ABORT – link is not connected!! abort..” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
exit
fi
else
echo “WARN – link IS connected – should now be – but.. forward..” >> $LOG
fi
##### HIGHEST CRITICAL – DRBD DISCARD DATA ########################################################o
echo “`date +%H:%M:%S` – >> drbd resync – discard local data <> $LOG
[ -z $RESYNC ] && echo “`date +%H:%M:%S` – VAR RESYNC is empty – exit!!” >> $LOG
[ -z $RESYNC ] && exit
#echo “resync: $RESYNC”
if [ $RESYNC = 1 ];then
drbdadm — –discard-my-data connect $resource
if [ $? = 0 ];then
echo ” OK – drbd discarding local data” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo ” FAILED – drbd discarding local data” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
fi
else
echo ” VAR RESYNC is 0 – no resync!” >> $LOG
exit
fi
##### SYNC STATE
echo “`date +%H:%M:%S` – SYNC-STATE CHECK” >> $LOG
GREPSTRING=sync
GREPFILE=/proc/drbd
### give drbd time to start syncing
sleep 5
echo “`date +%H:%M:%S` – are we synchron?” >> $LOG
while [ `cat $GREPFILE |grep $GREPSTRING > /dev/null` ]; do
echo “Sync in progress.. wait 1 sec..” >> $LOG
echo “———————————————” >> $LOG
` cat $GREPFILE` >> $LOG
echo “———————————————” >> $LOG
sleep 1
done
echo “`date +%H:%M:%S` Sync process finished!” >> $LOG
##### STATE CHECK
echo “`date +%H:%M:%S` – STATE CHECK” >> $LOG
echo ” Sleeping 60 seconds – giving drbd states chance to take over” >> $LOG
echo “” >> $LOG
sleep 60
ROLELOCAL=`cat /proc/drbd |grep ro: |awk ‘{ print $3 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 1}’`
ROLEREMOTE=`cat /proc/drbd |grep ro: |awk ‘{ print $3 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 2}’`
STATELOCAL=`cat /proc/drbd |grep ro: |awk ‘{ print $4 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 1}’`
STATEREMOTE=`cat /proc/drbd |grep ro: |awk ‘{ print $4 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 2}’`
ROLESTATEERR=0
if [ $ROLELOCAL = Secondary ];then
echo “Role local: $ROLELOCAL – OK” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo “Role local: $ROLELOCAL – ERROR” >> $LOG
ROLESTATEERR=`expr $ROLESTATEERR + 1`
ERRSUM=`expr $ERRSUM + $?`
fi
if [ $ROLEREMOTE = Primary ];then
echo “Role remote: $ROLEREMOTE – OK” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo “Role remote: $ROLEREMOTE – ERROR” >> $LOG
ROLESTATEERR=`expr $ROLESTATEERR + 1`
ERRSUM=`expr $ERRSUM + $?`
fi
if [ $STATELOCAL = UpToDate ];then
echo “State local: $STATELOCAL – OK” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo “State local: $STATELOCAL – ERROR” >> $LOG
ROLESTATEERR=`expr $ROLESTATEERR + 1`
ERRSUM=`expr $ERRSUM + $?`
fi
if [ $STATEREMOTE = UpToDate ];then
echo “State remote: $STATEREMOTE – OK” >> $LOG
ERRSUM=`expr $ERRSUM + $?`
else
echo “State remote: $STATEREMOTE – ERROR” >> $LOG
ROLESTATEERR=`expr $ROLESTATEERR + 1`
ERRSUM=`expr $ERRSUM + $?`
fi
echo “” >> $LOG
echo “********************” >> $LOG
echo “Error Summary: $ERRSUM” >> $LOG
echo “********************” >> $LOG
echo “” >> $LOG
if [ $ROLESTATEERR -gt 0 ];then
###################################################################################################
### EMERGENCY – NO REDUNDANCY ###
###################################################################################################
echo >> $LOG
echo >> $LOG
echo “#############################################” >> $LOG
echo ” >>>>>> EMERGENCY — NO REDUNDANCY <<<<<<> $LOG
ERRSUM=`expr $ERRSUM + 255`
echo >> $LOG
echo “`date +%H:%M:%S` – emergency – shuting down all relevanty services” >> $LOG
echo >> $LOG
### EMERG – IFACE
echo “`date +%H:%M:%S` – >> shutting down interface $iface <> $LOG
ifconfig $iface down
# check iface is down
if [ `ethtool $iface |grep "Link detected" |awk '{ print $3 }'` = yes ];then
echo ” ABORT – Interface $iface is still up! abort..” >> $LOG
exit
elif [ `ethtool $iface |grep "Link detected" |awk '{ print $3 }'` = no ];then
echo ” OK – Interface $iface done” >> $LOG
else
echo ” CONFUSION – Interface $iface – failure! abort..” >> $LOG
fi
#### EMERG – DRBD ROLE to SECONDARY
echo “`date +%H:%M:%S` – >> drbd role to secondary <> $LOG
if [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Primary" ];then
echo ” OK – drbd role is currently `drbdadm role $resource |awk -F/ ‘{ print $1 }’` – now set to Secondary” >> $LOG
drbdadm disconnect $resource
drbdadm secondary $resource
if [ $? = 0 ];then
sleep 1
if [ `drbdadm role $resource |awk -F/ '{ print $1 }'` = "Secondary" ];then >> $LOG
echo ” OK – drbd role is now `drbdadm role $resource |awk -F/ ‘{ print $1 }’`” >> $LOG
else
echo ” ABORT – drbd role is not secondary – role: `drbdadm role $resource |awk -F/ ‘{ print $1 }’` !! abort..” >> $LOG
fi
else
echo ” FAILED – drbd role to secondary has errors.. abort..” >> $LOG
fi
else
echo ” WARN – drbd role is not primary!! but.. forward..” >> $LOG
echo “role: `drbdadm role $resource |awk -F/ ‘{ print $1 }’`”
fi
echo >> $LOG
echo “******************************************” >> $LOG
echo “`date +%H:%M:%S` EMERGENCY STATE: ” >> $LOG
echo “Network Interface $iface: `ethtool $iface |grep Link` ” >> $LOG
echo “DRBD Role Local $resource: `cat /proc/drbd |grep ro: |awk ‘{ print $3 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 1}’` ” >> $LOG
echo “DRBD Role Remote $resource: `cat /proc/drbd |grep ro: |awk ‘{ print $3 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 2}’` ” >> $LOG
echo “DRBD State Local $resource: `cat /proc/drbd |grep ro: |awk ‘{ print $4 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 1}’`” >> $LOG
echo “DRBD State Remote $resource: `cat /proc/drbd |grep ro: |awk ‘{ print $4 }’ |awk -F: ‘{ print $2 }’ |awk -F/ ‘{ print $ 2}’`” >> $LOG
echo “” >> $LOG
echo “********************” >> $LOG
echo “Error Summary: $ERRSUM” >> $LOG
echo “********************” >> $LOG
echo “” >> $LOG
echo “******************************************” >> $LOG
echo “” >> $LOG
fi
TIMEEND=`date +%s`
TIMEDIFFSEC=`expr $TIMEEND – $TIMESTART`
while [ $TIMEDIFFSEC -gt 59 ]
do
TIMEDIFFSEC=`expr $TIMEDIFFSEC – 60`
TIMEDIFFMIN=`expr $TIMEDIFFMIN + 1`
done
echo “” >> $LOG
echo “Total Backup Duration: $TIMEDIFFMIN minutes, $TIMEDIFFSEC seconds” >> $LOG
echo “” >> $LOG
echo “———————————————————————————-” >> $LOG
echo “`date +%H:%M:%S` + `date +%Y.%m.%d` – end backup up raw files” >> $LOG
echo “” >> $LOG
echo “##################################################################################” >> $LOG
rm $LOCK
##### EMAIL NOTIFICATION ##########################################################################
cat $LOG |mail -s “$CUSTOMER $HOST Backup DRBD Raw Automated” $EMAIL1 — -F $MAILSENDER
cat $LOG |mail -s “$CUSTOMER $HOST Backup DRBD Raw Automated” $EMAIL2 — -F $MAILSENDER