Ryzen and ECC in the real world

| 4 minutes


Introduction

So, ECC memory and consumer Ryzen platforms. Sanctioned by AMD, but what about actually taking advantage of it? That’s where finding information becomes unnecessarily complicated. You find posts where ECC is being reported as enabled, but the poster didn’t actually say whether ECC was actually working (i.e. corrections and uncorrectable errors reported to the OS), you find a post of ECC working correctly on a board that isn’t yours, or you find posts where there’s no conclusion whatsoever.

But what about the manufacturer?

ASUS tech specs page with ECC highlighted Motherboard manufacturers aren’t always particularly helpful either. If you look at the tech specs page of the Asus ROG Strix X570-E Gaming board that my desktop has, you’ll see that ECC is explicitly mentioned several times as supported, but looking up locally available unbuffered ECC part models yields no results. At the same time setup has ECC options fully exposed, so they must have tested it on some level.

There is only one option to find out for sure - take the plunge and buy some ECC memory myself. I chose two Kingston KSM26ED8/32ME 32GB 2666MHz CL19 sticks for this experiment, to allow myself to upgrade to 128GB down the line if everything works as it should.

CL19 sounds bad if you compare to consumer memory, but in reality the difference it makes in real-world applications is generally small. Plus since it’s ECC you can attempt overclocking and timing tweaks with more confidence than you could with non-ECC memory, assuming error reporting works correctly.

Testing that ECC is reported correctly

Under Linux, testing that ECC is reported correctly is simple enough, simply grep dmesg for EDAC, and make sure DDR ECC is detected. edac-util -v allows you to see correction metrics. dmidecode -t memory will tell you all there is to know about your currently installed memory, including error correction type and Data/Total Width.

Under Windows, testing that ECC is being reported correctly is also very simple, you just need to run one or both of these WMI lookups:

C:\Users\dgurney>wmic memorychip get datawidth,totalwidth
DataWidth  TotalWidth
64         64
64         64
64         64
64         64

C:\Users\dgurney>wmic memphysical get memoryerrorcorrection
MemoryErrorCorrection
3

With non-ECC memory, the TotalWidth value matches DataWidth, and MemoryErrorCorrection is reported as 0x3 (none). With ECC, you want to see TotalWidth exceed DataWidth, and the MemoryErrorCorrection should be 0x6 (multi-bit ECC) in most cases.

The next question is simple: did my ECC memory get detected properly?

C:\Users\dgurney>wmic memorychip get datawidth,totalwidth
DataWidth  TotalWidth
64         72
64         72


C:\Users\dgurney>wmic memphysical get memoryerrorcorrection
MemoryErrorCorrection
6

YES! Everything reads exactly as it should. On to step two…

Testing that ECC is working

The tricky part, once you’ve made sure ECC is detected by the OS, is to actually verify it works. The easiest way on consumer hardware is to key in a very demanding overclock + timings, and then turning down voltage. In my case, I keyed in the following settings: 3600MHz (19-19-19-30) @ 1.25v.

Under Windows, look for WHEA-Logger corrected hardware error events in Event Viewer. Under Linux, look at dmesg output or edac-util -v for corrected errors. If you don’t see even a single corrected error no matter how unstable your settings are, ECC is either working silently or not at all.

For my testing I chose Prime95 in Large FFTs mode, since that stresses memory the most.

Drumroll please… is my ECC RAM actually fully working? WHEA events Opened corrected hardware error WHEA event YES! Plenty of corrected errors in the event log! Thank god for that, I hate returning things.

Conclusion

So, at least on Asus boards that mention ECC anywhere in the spec sheet, it should work fine even if the QVL has no validated parts. But ultimately, your best bet is to take advantage of return policies and just try it yourself, unless you have (or are going to buy) a known-good motherboard.

Bonus ramble: would I recommend it?

If the lack of RGB, premade XMP profiles, and pretty heatsinks doesn’t bother you, the additional peace of mind is well worth it. The risk of silent data corruption and the amount of inexplicable crashes is drastically lowered, and a failing stick is very obvious since you have error logging.

The amount of time in my life I’ve spent troubleshooting bizarre issues that ended up being bad or poorly configured memory would have netted me a fair amount of money if it was paid work!