A compact, canonical
binary serializer.
Human-readable when you need it.

Surp is a Rust-backed serialization toolkit. A stable, block-framed v1 binary format with XXH64 checksums and optional compression — paired with a text notation, a CLI, native Python bindings, and a C ABI. The additive RFC-001 work lives in its own namespace.

USER.SURP · V1 TEXT NOTATIONXXH64 ✓
{
  id: 1001;
  name: "Alice";
  active: true;
  tags: ["admin", "ops"];
  settings: {
    theme: "dark";
    region: "us";
  };
  avatar: b64#AQID;
}
$ surp from-json user.json -o user.surpwritten · 124 bytes · 1 block · checksum valid~0.27ms
Rust

Encode and decode without ceremony.

A small surface area: Value, Encoder, Decoder, and the borrowed SurpValue<'a>. Derive macros cover named structs.

rustverified from README.md
use surp_core::{Decoder, Encoder, Value};

let value = Value::Object(vec![
    ("name".into(), Value::Str("Alice".into())),
    ("age".into(),  Value::UInt(30)),
    ("active".into(), Value::Bool(true)),
]);

let mut encoder = Encoder::new();
encoder.encode_value(&value)?;
let bytes = encoder.finish()?;

let mut decoder = Decoder::new(&bytes);
let decoded = decoder.decode_next()?.to_owned_value();
assert_eq!(decoded, value);
Python

A package called surp. Nothing else to learn.

PyO3 module with the obvious dumps/loads, plus a typed SurpValue view for discoverable access.

pythonverified from README.md
import surp

payload = {
    "name": "Alice",
    "age": 30,
    "active": True,
    "avatar": b"\x01\x02\x03",
}

data = surp.dumps(payload, dedup=True, sort_keys=True)
decoded = surp.loads(data)
assert decoded == payload

view = surp.loads_value(data)
assert view["name"].value == "Alice"
System design · high-level

One codec, traced from caller to bytes.

Every surface — the CLI, the Python module, the C ABI, the MCP server — collapses into the same in-memory Value tree, which the encoder walks to produce the v1 block stream. Decoding plays the same stages in reverse, with limits and checksums checked before any value crosses back into caller code.

SURFACESCORE (surp-core)BYTESsurp-clibinary toolsurp-pythonPyO3 modulesurp-ffiC ABIsurp-mcpMCP serversurp-derive#[derive(Surp)]Value / SurpValue<'a>ordered Object · Array · scalars · BytesEncoderwalks · varints · dedup tableDecoderbounds · limits · zero-copyMAGICSURPVERSIONu8BLOCK 0type · len · ckBLOCK 1… payload …STRING TABLE (opt)dedupTRAILERXXH64TRUST ↓limits + ck verified before alloc
FIG · HIGH-LEVEL DATA FLOW
Trust boundary

Inputs (file, network, FFI buffer) are untrusted. Limits, checksums, and varint bounds are enforced before allocations grow.

Single codec

All language surfaces call into surp-core. Behaviour matches across Python, Rust, the CLI, and the C ABI by construction.

Additive RFC-001

CBF/CTN/CQL live under their own namespace. v1 files stay byte-compatible forever; nothing in RFC-001 changes how a .surp file is read.

System design · low-level

A block, byte by byte.

v1 stores values inside framed blocks. Each block is self-describing: type, length, compression flag, payload, and a per-block XXH64 checksum. The trailer closes the file with an overall checksum and (optionally) an index for random access.

012345678910byte offset →typeu8lenvarintcompu8payloadraw or zstd / lz4 compressedchecksumXXH64next blockONE FRAMED BLOCK · self-describing · independently checksummed
FIG · BLOCK LAYOUT (V1)
Encode pipeline
  1. 1
    Value tree
    Caller hands in a Rust struct (via derive) or a Python dict → mapped to ordered Value::Object.
  2. 2
    Walk + bound
    Encoder walks depth-first, checks size budget at each level, picks varint width per length.
  3. 3
    String dedup
    If enabled, repeated strings are replaced by indices into a side string table block.
  4. 4
    Block framing
    Payload is wrapped with type / len / comp / xxh64. Compression (zstd/lz4) is optional and per-block.
  5. 5
    Trailer
    Overall XXH64 closes the file. Optional index block enables random-access reads.
Why these choices

Small reads stay small. Big reads stay safe.

  • Length-prefixed blocks let a reader skip past a payload it doesn't care about — no streaming-state machine, no peek-ahead.
  • Per-block XXH64 means corruption is localized; a damaged block fails fast, the others still verify.
  • Optional string-dedup block trades a small encode cost for repeated string savings — see string_heavy.
  • Limits before allocation — max-depth, max-bytes, max-strings — are enforced from the varint header alone, so an adversarial input never grows memory.
  • Zero-copy borrows through SurpValue<'a> only apply to uncompressed v1 data; compressed blocks fall back to owned Value.
The shape of a release

From source to a checked .surp file.

Every encode follows the same set of stages — and decode plays them back in reverse. The pastel pills below are scoped to product surfaces; here they label the editorial stages.

  1. Schema in mind

    You start with a Rust struct, a Python dict, or a JSON file. Surp doesn't require a schema, but a derive macro can encode IDs for stable evolution.

  2. Encode

    Encoder walks the value tree, emits varints for lengths, optionally dedups strings. No compression by default.

  3. Block framing

    The block writer prefixes payloads with type + length + compression type + an XXH64 checksum.

  4. Trailer

    A final overall checksum closes the file. The container is now random-access friendly with an optional index block.

  5. Verified bytes

    On decode, limits and checksums are checked before any value crosses the boundary into caller code.

Read the spec. Ship a value.

Everything in this site is generated from the Surp source. The links below take you straight to verified material.