A serializer drawn from four neat lanes.
Surp's repository is a Cargo workspace of nine crates, a Python package, and a small set of fixtures and benches. Underneath the surface area, one codec carries the load — and one wire format is the only stable contract.
The workspace at a glance
The codec lives in surp-core. Public surfaces (surp-cli,surp-python, surp-ffi, surp-derive) all depend on it, never the other way around. IO and storage adapters (surp-io, surp-compression, surp-simd) sit beside the codec and are pulled in via Cargo features. The RFC-001 work lives entirely inside surp_core::rfc001, with its own namespace and its own file extension — never mixed with v1 .surp bytes.
From a value tree to a checked file
An encode never skips a stage. The encoder walks a Value tree and emits scalars with type tags and varint-encoded lengths. When string deduplication is on, repeated strings are interned into a dedup table that sits inside the same block. The block writer then prefixes the payload with a type byte, the payload length, a compression-type byte, and an XXH64 checksum of the uncompressed payload. A trailer block carries the overall checksum; readers verify both before exposing any value.
Where untrusted bytes become a value
The decoder is the only piece of code allowed to look at raw bytes. Limits (max depth, max element counts, max payload sizes) are enforced before allocation; checksum verification fails closed; corrupt or oversized inputs never reach a constructed Value. The Rust API exposes two value flavors: Value for owned trees, and SurpValue<'a> for borrowed zero-copy decode of uncompressed v1 data.
One format, two flavors of decode
Owned — Value
Allocates and owns its children. Use when you want a long-lived tree, mutation, or to ship the value across thread boundaries. Always available, including for compressed payloads.
Borrowed — SurpValue<'a>
Zero-copy view tied to the original byte buffer. Available for uncompressed v1 data. Pay nothing on decode; pay only when you ask a field for an owned string or array.
What each crate is responsible for
- surp-coreThe codec: encoder, decoder, value tree, block framing, text notation, resource limits, and the RFC-001 modules.
- surp-derive#[derive(Surp)] and #[derive(SurpSchema)] for named Rust structs; stable numeric field IDs for forward-compatible schema evolution.
- surp-cliThe surp binary tool. Verb-driven; converts JSON↔v1, encodes/decodes the text notation, inspects, validates, runs CLI benches and the RFC-001 commands.
- surp-pythonPyO3 extension that exports the Python package named surp; ships SurpValue views, Encoder/SurpDecoder, and the surp.model RFC-001 validation layer.
- surp-ioTokio framed IO, shared buffers via the bytes crate, optional mmap reader for memory-mapped decode.
- surp-compressionCompression trait and optional zstd, lz4, and snappy adapters. All three are feature-gated; none are required.
- surp-ffiC ABI helpers — JSON-to-Surp and Surp-to-JSON buffer entry points for embedding in non-Rust hosts.
- surp-simdScalar-safe scanning helpers and an optional aarch64 SIMD varint pre-scan path.
- benchCriterion-driven Rust and Python benchmark harnesses with deterministic datasets and committed result fixtures.
- fuzzcargo-fuzz targets and corpora for the decoder, the text parser, varints, block framing, and full roundtrips. Excluded from the workspace build by design.
Why this shape, and what it doesn't try to be
Three explicit tradeoffs steer the design. Safety over micro-optimization:checksums are verified before payloads are exposed, and resource limits sit between input and allocation. Determinism: the same input value produces the same bytes, every time — a property that makes diffing, content-addressing, and replay tractable. Schema evolution as a first-class feature: the derive macros encode stable numeric field IDs, and unknown fields are skipped on decode, so old readers gracefully ignore new fields.
Surp is not a streaming-only format and not an in-place editable format. It is a canonical container for value trees with optional random access via the index block. Anything that looks like a database, an RPC framework, or a schema registry is out of scope.