I recently came across this blog post with the catchy title of “DRBD and MySQL: Just Say No”. Now while I have absolutely no issue with people not liking DRBD or finding that it doesn’t fit their needs, I couldn’t help but notice that the post recycles some persistent myths about DRBD, which could use some correction.
I’ve tried to reply using a blog comment, but alas it seems I was moderated to /dev/null. Enter the “Write Post” button on my trusted WordPress dashboard.
So let’s look at the alleged “MySQL with DRBD Minuses” mentioned in said post I am referring to:
DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
If the filesystem that sits on top of DRBD gets corrupted, it will be equally corrupted on the peer node after failover. DRBD is a block device; it is agnostic of layers above it. That much is correct. Which incidentally is also the reason why you can use DRBD to add HA not only to databases, but to file services, virtualization, storage, etc. But I’m going off on a tangent.
“Failback could destroy the original master too”, however, is plain false. DRBD won’t “destroy the original master” any more than it already was if the filesystem on top of DRBD was fried beforehand.
Now as for the actual partition (meaning the backing device DRBD resides upon), DRBD adds to data security, not the opposite. DRBD will automatically detach from backing devices that throw I/O errors. And if you happen to get random errors (”bit flips”) in local data blocks due to I/O subsystem malfunction, online verify will catch that.
If the master panics, then after failover both fsck and transaction logs replay must be performed.
Transaction log replay, yes. But fsck? These days this amounts to running a journal replay. Takes under a second in most circumstances.
NIC and network corruption is also propagated.
False again. We have end-to-end replication integrity checking to prevent just that.
Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition.
The failover node is a hot standby, it’s just not a running slave node from the database’s standpoint. And, nothing stops you from running two databases on two servers on two DRBD devices laid out in a “criss-cross” fashion, converging on one node in case of node failure.
Could generate a lot of network traffic.
On a busy database, yes. But if you follow our design guidelines that always recommend a separate DRBD replication connection, this won’t hurt your application at all.
Cannot do maintenance on cold standby database.
But you can do anything you want with a database that you run off a DRBD LVM snapshot. Works on a Secondary node too.
2 heartbeats needed on a reliable, local network.
So I don’t see how this would be a minus, but then maybe that’s just me.
So to sum up yes we’ve seen all this before, and surprise surprise Eric Bergen’s “DRBD in the real world” has been quoted in that post as well. Now while I concede that some of the points Eric had made were valid at the time (and some continue to be), a lot of what he said then is now outdated, superseded, or has been addressed in DRBD releases made months ago. But to Eric’s credit, he fostered a lively discussion in the comments to his post, so I do encourage you to take a look.
April 27, 2008 at 17:53
> NIC and network corruption is also propagated.
That one is interesting - since it was mentioned as a minus for DRBD, when it’s just as true for MySQL replication
The worklog for event checksums in MySQL is #2540. Until that’s implemented, DRBD is doing *better* (not worse) in this regard.
April 27, 2008 at 18:16
Though we haven’t run DRBD yet in production, it is our plan (we’re setting up in QA currently for perf and fail over testing).
We run all our databases (SQLServer and MySQL) on a SAN for multiple reasons, and for most of the reasons you talk about (snapshots, DRBD sitting below the file system, etc) we have decided on this as a great HA solution for us. It also allows us to have each node attached to separate storage processors on the SAN, which will allow for a full storage processor failure without taking out our database system.
This is also way faster than any non-block device replication that I know of.
The issue with replication as a sole setup for HA is that it requires intelligent coding in the application should you have to switch to the slave as ‘master’ because there is not a shared ip that get’s migrated. Though there are patches for auto promoting masters from google, it’s not really the most straight forward process.
Ideally, if you can afford both (as in our case) from a disk usage perspective, then you get the benefits from both….
April 27, 2008 at 21:02
What specifically has changed to address my concerns and in which versions of drbd? I’ll gladly add updated sections to that blog entry so people aren’t getting outdated information.
April 27, 2008 at 21:43
Eric,
in your post you refer broadly to “any corruption on the primary master” getting propagated over to the DRBD peer. Now as I’ve stated and will continue to state, you’re right as far as upper I/O layers are concerned. And this won’t get “fixed” unless someone redesigns the complete Linux I/O stack.
But looking at possible issues _below_ DRBD, we’re chipping away at possible sources of corruption one by one (note that these are sources of corruption _outside_ DRBD that DRBD just happens to handle gracefully or rectify):
- Disk I/O errors on any node: Automatic detach, introduced pre-DRBD 8.0
- Network bit flips/network traffic corruption/NIC driver bugs: End-to-end replication integrity checks, introduced in 8.2.0
- Subtle disk I/O errors, bit flips, local-disk data corruption: Online device verification, introduced in 8.2.5
The latency concerns you mentioned have also greatly been mitigated by better CPU affinity handling introduced in 8.2.3. This has been back-ported to the 8.0 branch as well, and makes a particularly big difference on multi-core systems.
April 28, 2008 at 21:12
How is mysql replication worse off the dbrd in the case of nic /network corruption, if the binary logs contain full sql statements? If it mangles a byte changing INSERT to INSERQ. replication will break, but it won’t destroy your data on the slave. Now if it hits a blob field of binary data, then yes that might be a problem. I would think hat would be less likely to happen, or at least more dependent upon the schema and usage patterns.
April 29, 2008 at 5:50
[...] these days we see a lot of post for and against (more, more) using of MySQL and DRBD as a high availability [...]
April 29, 2008 at 9:52
Kris Buytaert adds an interesting angle to the discussion in a post titled “DRBD and MySQL: often say NO”; see http://www.krisbuytaert.be/blog/node/657.
April 29, 2008 at 9:57
Bill,
you’re right, a MySQL replication statement is probably more likely to altogether fail due to network corruption, rather than propagate garbage. But “more likely” doesn’t mean “certain”.
Just like for DRBD, without replication integrity checking, there is a chance that network corruption affects the DRBD protocol header rather than the packet payload. This would cause DRBD on the remote end to receive (and discard) a malformed packet. You just can’t tell for sure. Which is why in DRBD we adopted the end-to-end approach.
April 30, 2008 at 14:36
@Bill: FWIW, you are correct assuming it is statement based replication. In the case of row based replication, it’s possible that a bit flip could cause a syntactically correct ‘event’ that will corrupt data.