Search

Lib.rs

› Parser implementations | Email
#mime #rfc-5322 #parser #s-mime

mime-tree

RFC 5322/MIME parser producing a byte-range-indexed part tree

by Mark Atwood

  • Install
  • API reference
  • GitHub repo (markatwood)

4 releases

new 0.2.2 May 4, 2026
0.2.1 May 4, 2026
0.2.0 May 2, 2026
0.1.0 May 2, 2026

#566 in Parser implementations


Used in smime-tree

MIT/Apache

255KB
3.5K SLoC

mime-tree

License: MIT OR Apache-2.0 MSRV: 1.85

RFC 5322 / MIME parser that produces a walkable, byte-range-indexed part tree. Given raw message bytes, it returns a ParsedMessage with the full MIME structure, RFC 8621-compatible body views, and on-demand body decoding.

Why this crate exists

Most MIME parsers either give back owned strings (losing the original byte positions needed for S/MIME signature verification) or expose the underlying parsing library's types in their API (locking callers to that dependency). mime-tree gives you (offset, length) byte ranges into your original &[u8] buffer — so you can feed the exact bytes of a signed part directly to a cryptographic verifier without copying or re-encoding. The parsed result is fully owned, lifetime-free, and Serialize + Deserialize, so it round-trips through any store or message bus.

For S/MIME sign/verify/encrypt/decrypt, see the companion crate smime-tree.

Quick example

use mime_tree::{parse, decode_body_value};

let raw: &[u8] = b"From: alice@example.com\r\n\
                   Content-Type: text/plain; charset=utf-8\r\n\
                   \r\n\
                   Hello, world!\r\n";

let msg = parse(raw).expect("parse failed");

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    let decoded = decode_body_value(raw, part, None).unwrap();
    println!("{}", decoded.value);
}

Key types

ParsedMessage

The result of parse(). All fields are owned; no lifetime parameters.

Field Type Description
part_index ParsedPart Root of the MIME part tree
text_body Vec<String> Part IDs of text/plain body parts (RFC 8621 §4.1.4)
html_body Vec<String> Part IDs of text/html body parts
attachments Vec<String> Part IDs of attachment parts
headers Vec<ParsedHeader> Top-level message headers
preview Option<String> First ~256 chars of text content
warnings Vec<String> Non-fatal parse warnings

ParsedMessage implements Serialize + Deserialize.

ParsedPart

A single node in the MIME tree.

Field Type Description
part_id String IMAP dotted-path ID: "1", "1.1", "1.2", …
content_type String Media type/subtype, e.g. "text/plain"
charset Option<String> Charset from Content-Type, if present
transfer_encoding TransferEncoding See table below
disposition Option<String> Content-Disposition value
filename Option<String> Filename from Content-Disposition or Content-Type
cid Option<String> Content-ID header value
header_range (u32, u32) (offset, length) of part headers in original bytes
body_range (u32, u32) (offset, length) of part body (pre-decode) in original bytes
children Vec<ParsedPart> Child parts — non-empty for multipart/* only

Byte ranges use u32 so the serialized representation is stable across 32-bit and 64-bit hosts. MIME messages are bounded well within 4 GiB.

TransferEncoding variants

Variant CTE header value(s)
Identity none / 7bit / 8bit / binary (also the fallback for unknown values)
QuotedPrintable quoted-printable
Base64 base64
UUEncode x-uuencode, x-uue, uuencode
SevenBit 7bit
EightBit 8bit
Binary binary

Unknown CTE values fall back to Identity and add a warning to ParsedMessage::warnings.

DecodedBodyValue

Returned by decode_body_value().

Field Type Description
value String Decoded, charset-converted text
is_truncated bool True if max_bytes limit was reached
is_encoding_problem bool True if transfer-decode or charset conversion encountered an error

Decoding body content

decode_body_value slices the raw bytes using a part's body_range, applies transfer-encoding decode (Base64, Quoted-Printable, UUencode, etc.), and charset-converts the result to UTF-8 via encoding_rs. Decoding is on-demand — parse time is O(message size) and does not decode any bodies.

// Decode with a 64 KiB preview cap (pass None for unlimited).
let decoded = decode_body_value(raw, &part, Some(65_536))?;
if decoded.is_truncated {
    // body was larger than max_bytes
}
if decoded.is_encoding_problem {
    // transfer-decode or charset conversion hit an error; `value` may be partial
}

Inline UUencoded blocks

Some legacy messages — especially Usenet archives and mailing-list digests from the 1990s — embed UU-encoded files inside text/plain bodies with no Content-Transfer-Encoding header. Use scan_inline_uuencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_uuencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_uuencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes, mode {:o})",
                block.filename, block.data.len(), block.mode);
        }
    }
}

InlineUUBlock fields:

Field Type Description
begin_offset u32 Absolute byte offset of the begin line in raw
begin_length u32 Byte length of the entire block (through end\n)
mode u32 Unix permission mode from the begin line
filename String Filename from the begin line
data Vec<u8> Decoded binary payload
is_encoding_problem bool True if the block was truncated or malformed

Inline yEnc blocks

Usenet binary posts from the 2000s onward typically use yEnc encoding with no Content-Transfer-Encoding header — the article body is simply text/plain with =ybegin/=yend framing embedded in it. Use scan_inline_yencode to locate and decode those blocks:

use mime_tree::{parse, scan_inline_yencode};

let raw: &[u8] = /* raw message bytes */;
let msg = parse(raw).unwrap();

for id in &msg.text_body {
    let part = msg.part_index.find_by_id(id).unwrap();
    for block in scan_inline_yencode(raw, part) {
        if !block.is_encoding_problem {
            println!("found {} ({} bytes)", block.filename, block.data.len());
        }
    }
}

A reasonable heuristic before calling: check whether the part's decoded text contains the byte sequence b"=ybegin ".

InlineYEncBlock fields:

Field Type Description
begin_offset u32 Absolute byte offset of the =ybegin line in raw
begin_length u32 Byte length of the entire block (through =yend\n)
filename String Filename from =ybegin name=
file_size u64 Total file size from =ybegin size=
part Option<u32> Part number (multi-part only)
total_parts Option<u32> Total parts (multi-part only)
part_begin Option<u64> 1-based start offset in full file (multi-part only)
part_end Option<u64> 1-based end offset in full file (multi-part only)
data Vec<u8> Decoded binary payload
crc32_verified bool True if CRC32 was present and matched
is_encoding_problem bool True if the block was truncated, had a bad header, or CRC mismatch

For multi-part reassembly, pass each InlineYEncBlock's fields to yencoding_multi::Assembler.

Design invariants

  • No JMAP dependency. General-purpose MIME parser; no jmap-mail-types.
  • No S/MIME crypto. application/pkcs7-mime and application/pkcs7-signature parts are treated as opaque binary leaves. Use smime-tree for S/MIME processing.
  • Best-effort parsing. Malformed input yields a partial result plus warnings; only truly unparsable input (empty bytes, no headers) returns Err.
  • No async. Synchronous only.
  • Byte ranges, not stored bytes. The crate never retains the raw message bytes.

Specification references

RFC Title
RFC 5322 Internet Message Format
RFC 2045 MIME Part One: Format of Internet Message Bodies
RFC 2046 MIME Part Two: Media Types (multipart boundaries)
RFC 2047 MIME Part Three: Encoded-Word in headers
RFC 2183 Content-Disposition header
RFC 2231 MIME Parameter Value and Encoded Word Extensions
RFC 8621 §4.1.4 JMAP for Mail — body structure algorithm

License

Licensed under either of MIT or Apache-2.0 at your option.

Dependencies

~6MB
~172K SLoC

  • base64 0.22+alloc
  • encoding_rs
  • mail-parser 0.11+serde+full_encoding
  • quoted_printable
  • serde+derive
  • uuencoding
  • yencoding
Related: smime-tree, uuencoding, uuencoding-multi
See also: mail-parser, email-address-parser, addr-spec, daaki-message, ironhide, nutek-cipher, infer, mime, mime_guess, mailparse, tree_magic_mini

Lib.rs is an unofficial list of Rust/Cargo crates, created by kornelski. It contains data from multiple sources, including heuristics, and manually curated data. Content of this page is not necessarily endorsed by the authors of the crate. This site is not affiliated with nor endorsed by the Rust Project. If something is missing or incorrect, please file a bug.