Split brain, with DRBD, is much less of a disaster than in conventional cluster setups employing shared storage. But, you ask, how can I protect my DRBD cluster against split brain in the first place? Here’s how.
Let’s briefly reiterate what split brain, in the DRBD sense, really means. DRBD split brain occurs when your nodes have lost their replication link due to network failure, and you make both nodes Primary after that.
When just the replication link dies, Heartbeat as the cluster manager will still be able to “see” the peer node via an alternate communication path (which you hopefully have configured, see this post). Thus, there is nothing that would keep Heartbeat from migrating resources to that DRBD-wise disconnected node if it so decides or is so instructed. That would cause precisely the DRBD split brain situation described above.
If that were to happen, your cluster manager will have created two diverging sets of data, which are no longer identical. When that occurs, manual intervention is, for all practical purposes, inevitable. Not a desirable situation.
Enter dopd, the DRBD outdate-peer daemon. What dopd does for you is that the second it detects a connection failure between peer DRBD nodes, it will talk to Heartbeat and instruct it to use whatever communication paths it has still available to make contact with the remote node. Then, dopd on the peer node with outdate the DRBD resource there (set the Outdated flag in DRBD metadata). DRBD will subsequently stubbornly refuse to become Primary on that node under any circumstances. That is until the network connection is re-established and DRBD is confident that the local copy of the data is UpToDate again. This effectively prevents DRBD split brain from happening, and will make sure that you cluster service will not run on a cluster node that has a bad (outdated) set of data.
To enable dopd, just add these lines to your ha.cf on both nodes:
respawn hacluster /usr/lib/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster
You may have to adjust dopd’s path according to your preferred distribution.
Afterwards, run /etc/init.d/heartbeat reload or the equivalent command for your distribution. You should now see dopd as a running process in your process table (hint: ps ax | grep dopd)
Then, add these items to your DRBD resource configuration (again, on both nodes):
common {
handlers {
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater";
}
# other common settings go here
}
resource my-resource {
disk {
fencing resource-only;
}
#other resource-specific settings go here
}
Finally, issue drbdadm adjust all on both nodes to reconfigure your resources and reflect your drbd.conf changes.
Now, unplug your DRBD replication link. Observe /proc/drbd on your Secondary:
version: 8.0.5 (api:86/proto:86)
SVN Revision: 3011 build by buildsystem@barschlampe, 2007-08-03 07:44:08
0: cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown C r---
ns:0 nr:14 dw:14 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
The Secondary is now considered Outdated. If you feel like it, you may now attempt to manually switch over one of your DRBD-backed resources. It won’t come up on the remote node because it now potentially has outdated data.
Re-plug your DRBD replication link. Your Secondary will briefly re-sync and then be in UpToDate state again. A manual Heartbeat resource switch-over should now succeed.
October 2, 2007 at 0:37
I have just done exactly this 1 or 2 weeks ago, by reading the instruction on a file in the drbd source.
I found that sometimes when the link comes up again, the drbd status on 1 side stays Standalone instead of automatically connect and sync.
Have to try to do drbdadm connect to make it connects.
And, I have 2 resources on the same link on different port, when the outdate happens, it seems to show Outdated on 1 of the resource, but not both in the /proc/drbd file.
I am new with drbd, still learning and trying.
October 2, 2007 at 9:45
Can I ask you to post your issue on the drbd-user mailing list? Please do not forget to give your exact DRBD, Heartbeat and kernel version, and attach your drbd.conf and ha.cf, and relevant portions of your syslog.
You may subscribe to the mailing list at http://lists.linbit.com/listinfo/drbd-user
November 30, 2007 at 22:20
This appears to have the effect of keeping ha from rolling over to a node with no connections problems. Which means that the system only works if both nodes are working, correct? I’m trying to figure out how to keep an ha failed node from becoming primary again (once it has been rolled to secondary by ha) without user intervention. Can I use dopd for that somehow? I submitted this question to the user list but this is the first time I’ve heard about dopd.
Thoughts??
Rois
November 30, 2007 at 22:42
Rois,
your assumption is incorrect. A highly available system that only runs when both nodes are up wouldn’t be highly available. dopd just keeps you from failing over to a node that happens to have bad data.
If you just want to disable automatic failback — and it seems like that’s what you’re trying to do — just set auto_failback to off (Heartbeat version 1 and version 2 in non-CRM mode), or set a high default resource stickiness (Heartbeat 2 with CRM).
Cheers,
Florian
December 7, 2007 at 18:31
Florian,
Can you tell me how to get more debugging info? dopd does not appear to be working and I believe I’ve tracked it down to drbdsetup NOT being able to outdate the local resources. I’ve tried running the command manually on the peer and it does nothing. I know outdate works somehow because “fencing resource-only;” is doing it.
Here is my basic understanding of what dopd is suppose to do:
Primary sends a message to the secondary to outdate itself. The secondary is suppose to run the command
drbdadm outdate all.
If I run “drbdadm -d outdate all” it shows me that what is being run is:
drbdsetup /dev/drbd0 outdate
drbdmeta /dev/drbd0 v08 /dev/vg0/home internal outdate
Running drbdsetup on the secondary does not outdate the resources, does not return any messages and does not log anything to syslog.
Running drbdmeta returns “Device ‘/dev/drbd0′ is configured!” but does not outdate the resource and does not log anything.
Is there any other documentation on dopd or drbd that might help me figure out what my problem is.
Thx
December 7, 2007 at 19:27
OK . . . Back tracking a little here. Still would like to know where I can get some dopd doc’s but appearantly “drbdadm outdate all” doesn’t work unless the dopd nodes are disconnected. I tried disconnecting and it worked.
Upon closer inspection of the dopd syslog error it appears that it’s looking for drbdadm in /sbin and my distro has it in /usr/sbin. I tried copying and ln -s drbdadm, drbdsetup and drbdmeta to /sbin but neither option works and now the log shows
unknown exit code from /sbin/drbdadm outdate all: 126
instead of
unknown exit code from /sbin/drbdadm outdate all: 127
Any clues to where the path problem is?
I’ll put this on the forum as well.
Thanx
January 2, 2008 at 3:48
I’m looking at a setup where the only available communication between nodes would be a dsl connection. I’d like to have a dial on demand ppp connection available for situations where the replication link is broken, I can protect the data against split brain issues. Would dopd help me there?
January 2, 2008 at 10:39
Edward,
dopd is unlikely to help you in your scenario, as it would usually make little sense to use DRBD over a DSL connection. Except perhaps with protocol A and extremely low write load. But you’re probably better off with csync2.
January 2, 2008 at 15:12
I know this isn’t the “forum” but can you explain why it wouldn’t make sense to use it over a DSL link? I’ll have at least 250Kbits available for DRBD replication.
January 2, 2008 at 19:13
Block-level synchronous replication over a 250kbps link? Good luck. But don’t come complaining if your performance goes down the tubes.