How QR codes actually work
On this page
A QR code that’s missing 25% of its surface still scans. Cover a corner with a Coca-Cola logo and the phone camera reads it instantly. This isn’t because phones are smart; it’s because the QR encoder did most of the work years before anyone scanned it. Here’s what actually happens when you generate a QR code.
The visual anatomy
Every QR code has the same skeleton:
- Three finder patterns in the corners (top-left, top-right, bottom-left). Big square eyes; the scanner uses these to locate and orient the code.
- Alignment patterns in the interior (none for v1, more for larger versions). Small square dots; help the scanner re-orient when the code is on a curved or angled surface.
- Timing patterns — alternating black-and-white modules along row 6 and column 6, between the finders. The scanner uses these to count modules.
- Format info in two strips around the top-left finder. Encodes ECC level + mask pattern, with its own error correction (BCH).
- Version info for v ≥ 7 — two 6×3 blocks near the top-right and bottom-left finders.
- Data + ECC modules in the remaining cells, placed in a snake pattern from bottom-right.
The “data + ECC” region contains the actual payload plus Reed-Solomon parity codewords for error correction.
The encoding pipeline
Generating a QR code is roughly seven stages:
- Pick mode (numeric / alphanumeric / byte / kanji). Byte handles any UTF-8.
- Pack data into a bit stream: 4-bit mode indicator + character count + data bytes.
- Pad to fit: terminator (
0000), zero-pad to byte, then pad codewords (11101100and00010001alternating). - Pick smallest version (1–40, ranging from 21×21 to 177×177 modules) that fits the data + ECC for the chosen ECC level.
- Generate Reed-Solomon ECC for each block; interleave blocks.
- Lay out modules: function patterns first, then snake the data + ECC bits into the remaining cells.
- Apply mask + write format/version info: try all 8 masks, score each by penalty rules, pick the lowest-score result.
Each stage has constraints baked in by the spec. Skip step 3’s padding correctly and the entire payload is unreadable.
Why mode matters
The mode tells the scanner how to interpret the bit stream:
| Mode | Bits per character | When used |
|---|---|---|
| Numeric (0–9) | 3.33 (10 bits per 3 digits) | Phone numbers, IDs |
| Alphanumeric (0–9 A–Z $%*+-./:) | 5.5 (11 bits per 2 chars) | URLs in caps, IDs |
| Byte | 8 | UTF-8 anything |
| Kanji | 13 | Japanese text |
A URL with lowercase letters falls back to byte mode (8 bits per character). The same URL in all-caps fits alphanumeric mode (5.5 bits per char) — about 30% denser. Some QR libraries auto-pick mode; ours uses byte mode for everything for simplicity and UTF-8 support.
Reed-Solomon error correction (the magic)
The key insight: the QR code carries the original data plus mathematically-generated parity codewords that let the scanner recover from up to N errors per block. The math is over GF(256), the same Galois field used by AES key schedules.
Concretely:
- Treat each codeword (1 byte = 1 element of GF(256)) as a coefficient of a polynomial.
- Multiply the data polynomial by
x^k(where k = number of ECC codewords). - Compute the remainder when divided by a generator polynomial
g(x) = (x - α)(x - α²)...(x - α^k)— this remainder is the ECC codewords. - Append the ECC to the data.
When the scanner reads back the (possibly damaged) codewords, it performs the same polynomial division. If the remainder isn’t zero, errors are present. The Berlekamp-Massey algorithm (or a simpler syndrome decoder) locates and corrects up to k/2 errors.
ECC levels
QR has four ECC levels:
| Level | Recovery | k as % of total |
|---|---|---|
| L | ~7% | ~7% |
| M | ~15% | ~15% |
| Q | ~25% | ~25% |
| H | ~30% | ~30% |
H-level is what lets you slap a logo in the centre and still scan.
Block interleaving
Bigger QR codes split the data into multiple Reed-Solomon blocks. A v10 QR at level Q has 4 blocks; v40 at level H has 81 blocks. The encoder generates ECC per block, then interleaves them codeword by codeword:
Block 1: D1 D2 D3 ... + E1 E2 ...
Block 2: D1' D2' D3' ... + E1' E2' ...
Block 3: D1'' ...
Interleaved: D1 D1' D1'' D2 D2' D2'' ... E1 E1' E1'' E2 ...
This way, a localised burst of errors (e.g., a coffee stain across one region) damages a few codewords from each block instead of wiping out one block entirely. Each block can independently correct its share.
The 8 masks
After laying out data, the encoder applies a mask to data modules only (not function patterns). Each mask is a simple geometric pattern based on the module’s row/column:
0: (x + y) % 2 == 0
1: y % 2 == 0
2: x % 3 == 0
3: (x + y) % 3 == 0
4: (floor(x/3) + floor(y/2)) % 2 == 0
5: (x*y % 2 + x*y % 3) == 0
6: ((x*y % 2) + (x*y % 3)) % 2 == 0
7: ((x+y % 2) + (x*y % 3)) % 2 == 0
Where mask is true, the data bit is XORed (flipped). Why? To break up patterns that look like finder patterns or all-same-colour areas, which would confuse the scanner.
The encoder applies all 8 masks, scores each result by 4 penalty rules:
- Runs of 5+ same-colour modules in a row or column (3 + N).
- 2×2 blocks of same-colour modules (3 each).
- Patterns that look like a finder pattern (40 each).
- Imbalance between dark and light modules (10 per 5% off 50/50).
The mask with the lowest total score wins. This is why two QR codes with identical content can look slightly different — they may have chosen different masks.
Format info
The 5 bits encoding “ECC level + chosen mask” need to be readable before the rest of the code, so they’re stored with their own error correction:
- 5 data bits + 10 BCH(15,5) parity bits = 15 bits total
- XORed with
0x5412(a fixed bitmask) to ensure all-zero data still produces a non-zero pattern (otherwise the format strip would be all light, looking like a separator) - Stored in two regions so even if one is damaged, the other is readable
The scanner reads both copies, checks each against the BCH polynomial, and picks the most likely original 5 bits.
Version info
For QR versions 7 through 40, the version number is stored in two 6×3 blocks with 18-bit BCH(18,6) error correction:
- 6 data bits (the version number, 7–40) + 12 BCH parity bits
- Stored in two locations near the top-right and bottom-left finders
- Same redundancy logic as format info
For versions 1–6, the version is implicit in the QR’s overall size (21+4×ver modules per side), so no version-info region is needed.
What scanners actually do
In reverse:
- Find the three finder patterns: edge-detect, look for the 1:1:3:1:1 dark/light/dark/light/dark module ratio of a finder.
- Calculate orientation + perspective from the finder positions.
- Read alignment patterns to refine the perspective (especially important for curved surfaces or oblique angles).
- Read format info (twice; pick the best). This tells the scanner the ECC level and which mask was applied.
- Read version info if it’s a v7+ QR.
- Sample modules at the calculated grid positions.
- Reverse the mask to recover the unmasked data + ECC bits.
- De-interleave blocks.
- Run Reed-Solomon decode on each block; correct errors.
- Concatenate data, peel off mode + length headers, decode.
That’s the whole pipeline. Most of it runs in under 100 ms on a phone.
Designing for scannability
A few practical takeaways:
- Higher ECC = denser QR. Don’t go above M unless you have a reason. The “unscannable QR with H + logo” is a real failure mode when the print quality is bad.
- Quiet zone (white border around the code) is part of the spec — 4 modules wide. Cropping into it breaks scanning.
- Aspect ratio must be 1:1. No stretching.
- Contrast between dark and light modules should be high. Pure black on white scans best. Custom colours (foreground/background) work as long as the contrast is preserved.
- Module size affects scanability at distance. Rule of thumb: each module needs to be at least 1 px on the scanning camera’s sensor at the scanning distance. For a phone at arm’s length, that’s roughly 1 mm per module — so a v3 QR (29 modules across) needs to be ~3 cm.
Try the generator
The QR generator on this site implements all of the above: Reed-Solomon ECC, mask scoring, format info with BCH parity, the works. Open the page, paste a URL, and the SVG renders instantly — entirely in your browser. The “version” and “mask” shown under the QR are the chosen output of the algorithm above.
Related across the network
- base64.tooljo.com — for the related case of getting binary data into text-friendly form.
- hash.tooljo.com — Reed-Solomon shares Galois-field arithmetic with cryptographic hash functions.
- guid.tooljo.com — UUIDs are another “encode complex info into a stable string” problem.