BER TLV Encoding
BER-TLV encoding is a fundamental way to serialize structured data into bytes, especially in systems dealing with digital certificates, smart cards, and identity credentials. It comes from ASN.1 (Abstract Syntax Notation One), which defines data types abstractly — things like integers, strings, sequences, or sets — without tying them to a specific programming language or platform. To actually transmit or store that data, you need concrete encoding rules, and Basic Encoding Rules (BER) is the original, most flexible set of those rules. The "TLV" part describes the core pattern: every piece of data is broken into a Tag (what it is), a Length (how big the content is), and a Value (the content itself). This pattern repeats recursively, so complex nested structures naturally emerge.
The beauty of TLV is that it is self-describing. When you receive a byte stream, you read the tag to know the type, read the length to know where it ends, extract the value, and move on to the next TLV. Constructed types (like a SEQUENCE) have a value that is simply a concatenation of more TLV elements inside it. This makes parsing straightforward, though BER allows some ambiguity — the same logical data can have multiple valid byte representations (different padding, longer length forms, etc.).
How the parts work in practice:
The Tag occupies one or more bytes. The first byte splits into:
- Bits 8–7 for the class (universal, application, context-specific, or private).
- Bit 6 to indicate primitive (0) or constructed (1).
- Bits 5–1 for the tag number. If those bits are all 1s (31 decimal), the tag continues into additional bytes.
The Length can be definite (most common) or indefinite. For definite:
- If the first byte is < 128, it’s the short form — the length is that byte value directly.
- If the first byte has bit 8 set, it’s the long form — the lower 7 bits tell how many following bytes encode the actual length as a big-endian integer.
Indefinite length uses 0x80 and ends with a special End-of-Contents marker (00 00).
The Value is exactly “length” bytes (or everything until EOC for indefinite). For primitive types it’s the raw data; for constructed it’s more TLVs.
These diagrams illustrate the length field rules clearly — one general BER, one from the EMV specification (which uses BER-TLV heavily for payment card data).
Other Variants
Other encoding rules build on or restrict BER to solve specific problems:
- DER (Distinguished Encoding Rules) — A strict subset of BER. It eliminates ambiguity by enforcing the shortest possible forms (no leading zeros in integers, minimal length encoding, definite lengths only, sorted SET elements, specific BIT STRING padding). This canonical form is critical wherever signatures are involved, because the bytes being signed must be identical across implementations. Almost all X.509 certificates, CMS/PKCS#7 structures, and modern PKI credential systems use DER.
- CER (Canonical Encoding Rules) — Similar to DER but permits indefinite lengths for very large values (e.g., huge certificates or streaming scenarios). Rarely seen today outside legacy systems.
- PER (Packed Encoding Rules) — Focuses on compactness. It removes many tags when the schema is known, packs bits tightly (aligned or unaligned variants), and is much smaller than BER/DER. Used in bandwidth-constrained telecom protocols (e.g., LTE/5G signaling).
- OER (Octet Encoding Rules) — Modern, efficient, designed for automotive and IoT. Fixed octet alignment, minimal overhead.
- JER (JSON Encoding Rules) — Encodes ASN.1 as JSON, useful for web-based credential systems.
In your domain, you’ll encounter BER-TLV most often in smart-card applets (ISO 7816, EMV contactless payments) and DER in traditional PKI certificates.
Practical Part
Let’s look at a tiny real-world-style example you can try yourself.
Consider the ASN.1 structure:
MyData ::= SEQUENCE {
version INTEGER,
name OCTET STRING
}
With values version = 1, name = "Alice".
In DER (what a real certificate would use):
Hex: 30 0A 02 01 01 04 05 41 6C 69 63 65
Breakdown:
- 30 = tag for SEQUENCE (universal constructed 16)
- 0A = length 10 bytes
- 02 01 01 = INTEGER 1 (tag 02, length 1, value 01)
- 04 05 41 6C 69 63 65 = OCTET STRING "Alice"
Here’s a minimal Python function to parse simple definite-length single-byte-tag TLVs (good starting point you can extend for multi-byte tags or indefinite):
def parse_tlv(data: bytes):
pos = 0
while pos < len(data):
tag = data[pos]
pos += 1
# Length
if data[pos] & 0x80: # long form
len_len = data[pos] & 0x7F
pos += 1
length = int.from_bytes(data[pos:pos + len_len], 'big')
pos += len_len
else: # short form
length = data[pos]
pos += 1
value = data[pos:pos + length]
pos += length
print(f"Tag: 0x{tag:02X}, Length: {length}, Value: {value.hex()}")
Run it on the example above:
data = bytes.fromhex("300A0201010405416C696365")
parse_tlv(data)
Output:
Tag: 0x30, Length: 10, Value: 0201010405416c696365
Tag: 0x02, Length: 1, Value: 01
Tag: 0x04, Length: 5, Value: 416c696365
The outer value contains the inner TLVs — you can recurse to handle constructed types.
For a real certificate: grab any .pem file (e.g., curl https://www.google.com > google.html, then extract the cert with openssl s_client -connect www.google.com:443 -showcerts), save the -----BEGIN CERTIFICATE----- block as cert.pem, then run:
openssl asn1parse -inform pem -in cert.pem
You’ll see the full nested TLV structure with offsets, tags, and lengths — exactly how a credential system parses identity data. Try it on a few different sites and compare fields like subject name (SEQUENCE of RelativeDistinguishedNames).
Recent Comments