4 releases

new 0.2.2	May 4, 2026
0.2.1	May 4, 2026
0.2.0	May 2, 2026
0.1.0	May 2, 2026

#566 in Parser implementations

Used in smime-tree

MIT/Apache

255KB
3.5K SLoC

mime-tree

RFC 5322 / MIME parser that produces a walkable, byte-range-indexed part tree. Given raw message bytes, it returns a ParsedMessage with the full MIME structure, RFC 8621-compatible body views, and on-demand body decoding.

Why this crate exists

Most MIME parsers either give back owned strings (losing the original byte positions needed for S/MIME signature verification) or expose the underlying parsing library's types in their API (locking callers to that dependency). mime-tree gives you (offset, length) byte ranges into your original &[u8] buffer — so you can feed the exact bytes of a signed part directly to a cryptographic verifier without copying or re-encoding. The parsed result is fully owned, lifetime-free, and Serialize + Deserialize, so it round-trips through any store or message bus.

For S/MIME sign/verify/encrypt/decrypt, see the companion crate smime-tree.

Quick example

use mime_tree::{parse, decode_body_value};

let raw: &[u8] = b"From: alice@example.com\r\n\
                   Content-Type: text/plain; charset=utf-8\r\n\
                   \r\n\
                   Hello, world!\r\n";

let msg = parse(raw).expect("parse failed");

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    let decoded = decode_body_value(raw, part, None).unwrap();
    println!("{}", decoded.value);
}

Key types

`ParsedMessage`

The result of parse(). All fields are owned; no lifetime parameters.

Field	Type	Description
`part_index`	`ParsedPart`	Root of the MIME part tree
`text_body`	`Vec<String>`	Part IDs of text/plain body parts (RFC 8621 §4.1.4)
`html_body`	`Vec<String>`	Part IDs of text/html body parts
`attachments`	`Vec<String>`	Part IDs of attachment parts
`headers`	`Vec<ParsedHeader>`	Top-level message headers
`preview`	`Option<String>`	First ~256 chars of text content
`warnings`	`Vec<String>`	Non-fatal parse warnings

ParsedMessage implements Serialize + Deserialize.

`ParsedPart`

A single node in the MIME tree.

Field	Type	Description
`part_id`	`String`	IMAP dotted-path ID: `"1"`, `"1.1"`, `"1.2"`, …
`content_type`	`String`	Media type/subtype, e.g. `"text/plain"`
`charset`	`Option<String>`	Charset from Content-Type, if present
`transfer_encoding`	`TransferEncoding`	See table below
`disposition`	`Option<String>`	Content-Disposition value
`filename`	`Option<String>`	Filename from Content-Disposition or Content-Type
`cid`	`Option<String>`	Content-ID header value
`header_range`	`(u32, u32)`	`(offset, length)` of part headers in original bytes
`body_range`	`(u32, u32)`	`(offset, length)` of part body (pre-decode) in original bytes
`children`	`Vec<ParsedPart>`	Child parts — non-empty for `multipart/*` only

Byte ranges use u32 so the serialized representation is stable across 32-bit and 64-bit hosts. MIME messages are bounded well within 4 GiB.

`TransferEncoding` variants

Variant	CTE header value(s)
`Identity`	none / `7bit` / `8bit` / `binary` (also the fallback for unknown values)
`QuotedPrintable`	`quoted-printable`
`Base64`	`base64`
`UUEncode`	`x-uuencode`, `x-uue`, `uuencode`
`SevenBit`	`7bit`
`EightBit`	`8bit`
`Binary`	`binary`

Unknown CTE values fall back to Identity and add a warning to ParsedMessage::warnings.

`DecodedBodyValue`

Returned by decode_body_value().

Field	Type	Description
`value`	`String`	Decoded, charset-converted text
`is_truncated`	`bool`	True if `max_bytes` limit was reached
`is_encoding_problem`	`bool`	True if transfer-decode or charset conversion encountered an error

Decoding body content

decode_body_value slices the raw bytes using a part's body_range, applies transfer-encoding decode (Base64, Quoted-Printable, UUencode, etc.), and charset-converts the result to UTF-8 via encoding_rs. Decoding is on-demand — parse time is O(message size) and does not decode any bodies.

// Decode with a 64 KiB preview cap (pass None for unlimited).
let decoded = decode_body_value(raw, &part, Some(65_536))?;
if decoded.is_truncated {
    // body was larger than max_bytes
}
if decoded.is_encoding_problem {
    // transfer-decode or charset conversion hit an error; `value` may be partial
}

Inline UUencoded blocks

Some legacy messages — especially Usenet archives and mailing-list digests from the 1990s — embed UU-encoded files inside text/plain bodies with no Content-Transfer-Encoding header. Use scan_inline_uuencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_uuencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_uuencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes, mode {:o})",
                block.filename, block.data.len(), block.mode);
        }
    }
}

InlineUUBlock fields:

Field	Type	Description
`begin_offset`	`u32`	Absolute byte offset of the `begin` line in `raw`
`begin_length`	`u32`	Byte length of the entire block (through `end\n`)
`mode`	`u32`	Unix permission mode from the `begin` line
`filename`	`String`	Filename from the `begin` line
`data`	`Vec<u8>`	Decoded binary payload
`is_encoding_problem`	`bool`	True if the block was truncated or malformed

Inline yEnc blocks

Usenet binary posts from the 2000s onward typically use yEnc encoding with no Content-Transfer-Encoding header — the article body is simply text/plain with =ybegin/=yend framing embedded in it. Use scan_inline_yencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_yencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_yencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes)", block.filename, block.data.len());
        }
    }
}

A reasonable heuristic before calling: check whether the part's decoded text contains the byte sequence b"=ybegin ".

InlineYEncBlock fields:

Field	Type	Description
`begin_offset`	`u32`	Absolute byte offset of the `=ybegin` line in `raw`
`begin_length`	`u32`	Byte length of the entire block (through `=yend\n`)
`filename`	`String`	Filename from `=ybegin name=`
`file_size`	`u64`	Total file size from `=ybegin size=`
`part`	`Option<u32>`	Part number (multi-part only)
`total_parts`	`Option<u32>`	Total parts (multi-part only)
`part_begin`	`Option<u64>`	1-based start offset in full file (multi-part only)
`part_end`	`Option<u64>`	1-based end offset in full file (multi-part only)
`data`	`Vec<u8>`	Decoded binary payload
`crc32_verified`	`bool`	True if CRC32 was present and matched
`is_encoding_problem`	`bool`	True if the block was truncated, had a bad header, or CRC mismatch

For multi-part reassembly, pass each InlineYEncBlock's fields to yencoding_multi::Assembler.

Design invariants

No JMAP dependency. General-purpose MIME parser; no jmap-mail-types.
No S/MIME crypto. application/pkcs7-mime and application/pkcs7-signature parts are treated as opaque binary leaves. Use smime-tree for S/MIME processing.
Best-effort parsing. Malformed input yields a partial result plus warnings; only truly unparsable input (empty bytes, no headers) returns Err.
No async. Synchronous only.
Byte ranges, not stored bytes. The crate never retains the raw message bytes.

Specification references

RFC	Title
RFC 5322	Internet Message Format
RFC 2045	MIME Part One: Format of Internet Message Bodies
RFC 2046	MIME Part Two: Media Types (multipart boundaries)
RFC 2047	MIME Part Three: Encoded-Word in headers
RFC 2183	Content-Disposition header
RFC 2231	MIME Parameter Value and Encoded Word Extensions
RFC 8621 §4.1.4	JMAP for Mail — body structure algorithm

License

Licensed under either of MIT or Apache-2.0 at your option.

Dependencies

~6MB
~172K SLoC