“read-balancing” with 8.4.1+

DRBD 8.4.1 introduces a new feature: read-balancing, which is configured in the disk section of the configuration file(s). This feature enables DRBD to balance read requests between the Primary/Secondary nodes.

While writes occur on both sides of the cluster, by default the reads are served locally (ie., the value is prefer-local). This might not be optimal if you’ve got a big pipe to the other node and a heavily loaded IO subsystem.

read-balancing has several options to choose from:

  • 32K-striping up to 1M-striping chooses the node to read from via the block address – eg. for 512K-striping the first half of each MiByte would be read from one machine, and the second half from the other1.
    This is a simple, static load-balancing.
  • round-robin just passes the request to alternating nodes.
    This might go wrong if your application reads 4kiB, 1MiB, 4kiB, 1MiB, and so on – but this is fairly unlikely.
  • least-pending chooses the node with the smallest number of open requests.
  • when-congested-remote uses the remote node if there are local requests2.
  • prefer-remote is implemented for completeness, however as of this writing there is no viable use case.

Please note that all this is still below the filesystem layer – so even if the secondary is used for reading, this won’t speed up a failover, as the pages read are not kept anywhere.

LINBIT participates in the German Cloud (“Deutsche Wolke”)

Deutsche Wolke, Logo

Deutsche Wolke (“German Cloud”) was founded to establish Federal Cloud Infrastructure in Germany.

This infrastructure will provide additional legal and security protections for hosted data.  No longer will small businesses be exposed to the legal risk of losing their website presence without a trial (an unfortunate reality when doing business on transatlantic clouds).

The natural partner for backend storage infrastructure is LINBIT; as authors and maintainers of DRBD, we are best suited to provide the technical expertise to achieve High Availability.  Also, DRBD Proxy is the obvious choice for off-site or disaster recovery replication (from the office into the cloud).

We at LINBIT look forward to seeing this project grow and prosper!

Monitoring: better safe than sorry…

Stumbling upon the Holy time-travellin’ DRBD, batman! blog post there’s only one thing to be said …

Be strict in what you emit, liberal in what you accept1

is simply not true when dealing with mission-critical systems.

It’s ok to be alerted on upgrading a machine because the “old, working” RegEx that did the parsing doesn’t match anymore2; it’s not a problem to get an email when someone adds the 100th DRBD resource and causes the grep to fail; and so on.

Better to have a few false positives when you’re actively changing things than to get a false negative that costs you months of data; that’s what an assert (and monitoring isn’t that different) is for, after all.

Keep monitoring strict, and let it fail loudly on unexpected things – after the first few occurrences they’re not unexpected anymore and can be dealt with.

Maximum volume size on DRBD

From time to time we get asked things like this:

I want to use a 10TiB volume with DRBD, is that supported”?

The easiest way to answer things like that is to say look for yourself on the public DRBD usage page – the biggest public device size is ~220TiB, so go figure ;)

The current maximum device size is 1EiB (1 ExiByte = 1024 TibiByte1), so there’s a bit of room left.

DRBD needs about 32MB RAM per TB storage, so for 1 EB storage you’ll need 32GiB of RAM just for the DRBD bitmap2. Having a bit more for the OS, userspace and buffer cache is left as an exercise for the reader.

If you’ve got questions, ask the DRBD experts at LINBIT – we wrote the code, after all!

Editing the Pacemaker configuration with VIM

For people using the VIM editor I’ve got two small tips when editing Pacemaker configurations:

Use syntax highlight. This helps to see unmatched quote characters easily. Whether it’s too colorful can be discussed, though ;)
A current version can be found here, and the mailing list post is here.

For correlating resource names I recommend the Mark plugin. Continue reading