Monitoring: better safe than sorry…

Stumbling upon the Holy time-travellin’ DRBD, batman! blog post there’s only one thing to be said …

Be strict in what you emit, liberal in what you accept1

is simply not true when dealing with mission-critical systems.

It’s ok to be alerted on upgrading a machine because the “old, working” RegEx that did the parsing doesn’t match anymore2; it’s not a problem to get an email when someone adds the 100th DRBD resource and causes the grep to fail; and so on. Continue reading

DRBD resources need different monitor intervals

As briefly mentioned in Pacemaker Explained, DRBD devices need two different values set for their monitor intervals:

primitive pacemaker-resource-name ocf:linbit:drbd         \
        params drbd_resource="drbd-resource"              \
        op monitor interval="61s" role="Slave"            \
        op monitor interval="59s" role="Master"

The reason is that Pacemaker distinguishes monitor operations by their resource and their interval – but not by their role. So, if this distinction is not done “manually”, Pacemaker will monitor only one of the two (and, with DRBD 9, more) nodes, which is not what you want (usually).