This post outlines some of the basic concepts involved in DRAM technologies. It also tries to explain some of error sources and error handling that is present in modern servers.
DDR4 basics
Before we go any further, it is important to understand some of the basics of DRAM architecture. A lot of well-explained details are present here [7], so we will go over a quick summary. This is what the DRAM chip looks like from outside:


When we dive in one level, we see its organized into:
- Bank groups
- Banks
- Rows
- Columns

If we go in one more level, we see the following in each bank:
- Memory arrays
- Row decoders
- Column decoders
- Sense amplifiers

Once the Bank Group and Bank have been identified, the Row part of the address activates a line in the memory array. This is called the “Word Line” and activating it reads data from the memory array into something called “Sense Amplifiers”. The Column address then reads out a part of the word that was loaded into the Sense Amps. The width of the column is called the “Bit Line”.
The width of a column is standard – it is either 4 bits, 8 bits or 16 bits wide and DRAMs are classified as x4, x8 or x16 based on this column width. DO note that the width of DQ data bus is same as the column width. So, to simplify things, it can be said that DRAMs are classified based on the width of the DQ bus.
At the lowest level, a bit is essentially a capacitor that holds the charge and a transistor acting as a switch. It is really important to remember this understanding of the bit, to further understand sources of various errors.

Since the capacitor discharges over time, the information eventually fades unless the capacitor voltage is periodically refreshed. This is where the ‘D’ in DRAM comes from – it refers to Dynamic as compared to SRAM (Static Random Access Memory).

DRAM Size Calculation
We can try to make some more sense of the above table by hand-calculating two of the sizes
/* 4Gb x4 Device */
Number of Row Address bits: A0-A15 = 16 bits
Total number of row = 2^16 = 64K
Number of Column Address bits: A0-A9 = 10 bits
Number of columns per row = 1K
Width of each column = 4 bits
Number of Bank Groups = 4
Number of Banks = 4
Total DRAM Capacity = Num.of.Rows x Num.of.Columns x Width.of.Column x Num.of.BankGroups x Num.of.Banks
Total DRAM Capacity = 64K x 1K x 4 x 4 x 4 = 4Gb
/* 4Gb x8 Device */
Number of Row Address bits: A0-A154 = 15 bits
Total number of row = 2^15 = 32K
Number of Column Address bits: A0-A9 = 10 bits
Number of columns per row = 1K
Width of each column = 8 bits
Number of Bank Groups = 4
Number of Banks = 4
Total DRAM Capacity = Num.of.Rows x Num.of.Columns x Width.of.Column x Num.of.BankGroups x Num.of.Banks
Total DRAM Capacity = 32K x 1K x 8 x 4 x 4 = 4Gb
DRAM Page Size
In the table above, there’s a mention of Page Size. Page size is essentially the number of bits per row. Or put it another way, it is the number of bits loaded into the Sense amplifiers when a row is activated. Since the column address is 10 bits wide, there are 1K bit-lines per row. So, for a x4 device number of bits is 1K x 4 = 4K bits (or 512B). Similarly, for x8 device it is 1KB and for x16 it is 2KB per page.
Lets explore some more concepts in DRAM architecture.
Rank (Depth Cascading)
When dealing with DRAMs you’ll come across terminology such as Single-Rank, Dual-Rank or Quad-Rank. Rank is the highest logical unit and is typically used to increase the memory capacity of your system.
Say you need 16Gb of memory. Depending on what’s available in the market and what is cheaper, you could have a single 16Gb memory die, in this case you would call it a Single Rank system because you just need 1 ChipSelect signal (CS_n) to read all the contents of the memory. Or you could choose to have 2 individual 8Gb discrete devices soldered down on the PCB (because 2x8Gb devices happen to be cheaper than 1x16Gb). In this case the 2 devices will be connected to the same address and data busses, but you will need 2 ChipSelects to separately address each device. Since you need two ChipSelects, this setup is called Dual-Rank.
One other DRAM variety you may come across is a “Dual-Die Package” or DDP. In this case you’ll have a single DRAM chip soldered on the board but internally within the package it’ll have a stack of 2 dies. Each die will once again share address and data lines but will have separate chip selects, making it a Dual Rank device.
Common terms and their definitions
We will now further explore some of the definitions that are common in DRAM area:
DDR, which stands for Double Data Rate [2], by using both edges of the clock, the data signals operate with the same limiting frequency, thereby doubling the data transmission rate.
DDR speeds are referred to in MT/s (Mega Transfers per Second) or GB/s (Giga Bytes per second). DDR SDRAM popularized the technique of referring to the bus bandwidth in megabytes per second, the product of the transfer rate and the bus width in bytes. DDR SDRAM operating with a 100 MHz clock is called DDR-200 (after its 200 MT/s data transfer rate), and a 64-bit (8-byte) wide DIMM operated at that data rate is called PC-1600, after its 1600 MB/s peak (theoretical) bandwidth. Likewise, 1.6 GT/s transfer rate DDR3-1600 is called PC3-12800.[2]
Dual-channel-enabled memory controllers in a PC system architecture use two 64-bit data channels. Similarly, there are triple and quad channel architectures. Theoretically, dual-channel configurations double the memory bandwidth when compared to single-channel configurations, but it requires software that can exploit the increased parallelism offered by the multi-channel memory configurations. This, Intel claims, leads to faster system performance as well as higher performance per watt.[3]
A DIMM or dual in-line memory module, commonly called RAM stick, comprises a series of dynamic random-access memory integrated circuits. These modules are mounted on a printed circuit board and designed for use in personal computers, workstations and servers.[8]
Variants of DIMM slots support DDR, DDR2, DDR3, DDR4 and DDR5 RAM.
A DIMM’s capacity and other operational parameters is usually programmed by the manufacturer in the serial presence detect (SPD), an additional chip which contains information about the module type and timing for the memory controller to be configured correctly. The SPD EEPROM connects to the System Management Bus and may also contain thermal sensors[8] usually referred to as TSOD, or Temperature Sensor on DIMM .
ECC DIMMs are those that have extra data bits which can be used by the system memory controller to detect and correct errors. There are numerous ECC schemes, but perhaps the most common is Single Error Correct, Double Error Detect (SECDED) which uses an extra byte per 64-bit word. ECC modules usually carry a multiple of 9 instead of a multiple of 8 chips.[8]
We will explore the concepts of errors on DIMM’s in our next post. Ciao!
References:
- https://software.intel.com/content/www/us/en/develop/articles/new-reliability-availability-and-serviceability-ras-features-in-the-intel-xeon-processor.html
- https://en.wikipedia.org/wiki/Double_data_rate
- http://www.intel.com/Assets/PDF/prodbrief/x58-product-brief.pdf
- https://www.intel.com/content/dam/doc/application-note/e7500-chipset-mch-x4-single-device-data-correction-note.pdf
- https://en.wikipedia.org/wiki/Lockstep_(computing)#MEMORY
- https://www.youtube.com/watch?v=kIpQXWTGnHA
- https://www.systemverilog.io/ddr4-basics
- https://en.wikipedia.org/wiki/DIMM
- https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
- https://arxiv.org/pdf/1904.09724.pdf
- Kimet al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” in ISCA, 2014
- https://en.wikipedia.org/wiki/Row_hammer
- https://software.intel.com/content/www/us/en/develop/articles/address-range-partial-memory-mirroring.html