A compact, canonical
binary serializer.
Human-readable when you need it.
Surp is a Rust-backed serialization toolkit. A stable, block-framed v1 binary format with XXH64 checksums and optional compression — paired with a text notation, a CLI, native Python bindings, and a C ABI. The additive RFC-001 work lives in its own namespace.
{
id: 1001;
name: "Alice";
active: true;
tags: ["admin", "ops"];
settings: {
theme: "dark";
region: "us";
};
avatar: b64#AQID;
}Four surfaces. One codec underneath.
Stable v1 wire format
Block-framed files, per-block XXH64 checksums and a trailer checksum. Optional string deduplication. Forward and backward compatible schema evolution.
Zero-copy when it can
Owned decode through Value, borrowed zero-copy decode through SurpValue<'a> for uncompressed v1 data. Resource limits enforced before any allocation.
CLI, Python, Rust, C ABI
The same codec drives a binary tool, a native PyO3 Python package, the public Rust API, and small C ABI helpers for JSON↔Surp buffers.
An additive next-gen path
CTN text + CBF binary + a baseline CQL path query engine, under surp_core::rfc001 and surp.rfc001. Separate namespace, separate file type.
Encode and decode without ceremony.
A small surface area: Value, Encoder, Decoder, and the borrowed SurpValue<'a>. Derive macros cover named structs.
use surp_core::{Decoder, Encoder, Value};
let value = Value::Object(vec![
("name".into(), Value::Str("Alice".into())),
("age".into(), Value::UInt(30)),
("active".into(), Value::Bool(true)),
]);
let mut encoder = Encoder::new();
encoder.encode_value(&value)?;
let bytes = encoder.finish()?;
let mut decoder = Decoder::new(&bytes);
let decoded = decoder.decode_next()?.to_owned_value();
assert_eq!(decoded, value);A package called surp. Nothing else to learn.
PyO3 module with the obvious dumps/loads, plus a typed SurpValue view for discoverable access.
import surp
payload = {
"name": "Alice",
"age": 30,
"active": True,
"avatar": b"\x01\x02\x03",
}
data = surp.dumps(payload, dedup=True, sort_keys=True)
decoded = surp.loads(data)
assert decoded == payload
view = surp.loads_value(data)
assert view["name"].value == "Alice"One codec, traced from caller to bytes.
Every surface — the CLI, the Python module, the C ABI, the MCP server — collapses into the same in-memory Value tree, which the encoder walks to produce the v1 block stream. Decoding plays the same stages in reverse, with limits and checksums checked before any value crosses back into caller code.
Inputs (file, network, FFI buffer) are untrusted. Limits, checksums, and varint bounds are enforced before allocations grow.
All language surfaces call into surp-core. Behaviour matches across Python, Rust, the CLI, and the C ABI by construction.
CBF/CTN/CQL live under their own namespace. v1 files stay byte-compatible forever; nothing in RFC-001 changes how a .surp file is read.
A block, byte by byte.
v1 stores values inside framed blocks. Each block is self-describing: type, length, compression flag, payload, and a per-block XXH64 checksum. The trailer closes the file with an overall checksum and (optionally) an index for random access.
- 1Value treeCaller hands in a Rust struct (via derive) or a Python dict → mapped to ordered Value::Object.
- 2Walk + boundEncoder walks depth-first, checks size budget at each level, picks varint width per length.
- 3String dedupIf enabled, repeated strings are replaced by indices into a side string table block.
- 4Block framingPayload is wrapped with type / len / comp / xxh64. Compression (zstd/lz4) is optional and per-block.
- 5TrailerOverall XXH64 closes the file. Optional index block enables random-access reads.
Small reads stay small. Big reads stay safe.
- Length-prefixed blocks let a reader skip past a payload it doesn't care about — no streaming-state machine, no peek-ahead.
- Per-block XXH64 means corruption is localized; a damaged block fails fast, the others still verify.
- Optional string-dedup block trades a small encode cost for repeated string savings — see string_heavy.
- Limits before allocation — max-depth, max-bytes, max-strings — are enforced from the varint header alone, so an adversarial input never grows memory.
- Zero-copy borrows through
SurpValue<'a>only apply to uncompressed v1 data; compressed blocks fall back to ownedValue.
From source to a checked .surp file.
Every encode follows the same set of stages — and decode plays them back in reverse. The pastel pills below are scoped to product surfaces; here they label the editorial stages.
- Schema in mind
You start with a Rust struct, a Python dict, or a JSON file. Surp doesn't require a schema, but a derive macro can encode IDs for stable evolution.
- Encode
Encoder walks the value tree, emits varints for lengths, optionally dedups strings. No compression by default.
- Block framing
The block writer prefixes payloads with type + length + compression type + an XXH64 checksum.
- Trailer
A final overall checksum closes the file. The container is now random-access friendly with an optional index block.
- Verified bytes
On decode, limits and checksums are checked before any value crosses the boundary into caller code.
Read the spec. Ship a value.
Everything in this site is generated from the Surp source. The links below take you straight to verified material.