riker

Fast Rust CLI toolkit for sequencing QC metrics -- ports key QC metrics tools from Picard with cleaner output and better performance.

Visit us at Fulcrum Genomics to learn more about how we can power your Bioinformatics with riker and beyond.

Warning: ALPHA SOFTWARE - USE AT YOUR OWN RISK

This software is currently in ALPHA. While we have extensively tested these tools across a wide variety of data, no guarantees are made regarding correctness or stability.

Overview

Riker is the spiritual successor to picard for next-generation sequencing QC. It aims to provide modern, fast implementations of NGS QC metrics that tell you both "how does your sample look?" and often "why does it look bad?". It is not intended to be a drop-in replacement for Picard. Notably:

Command line structure is different
Output file formats are simplified and column names changed to resolve inconsistencies and confusion (and to use lower-case)
Bugs that are difficult to fix in Picard due to implementation choices have been fixed in Riker, yielding slightly different outputs

Instead, Riker provides broadly similar tools to many of Picard's most widely used QC tools, with a goal of leading you to the same conclusions as before, in a lot less time. Riker will remain focused on QC tools, and not attempt to replicate other tools from Picard. Though other tools have been re-implemented for the better elsewhere too (e.g. fgumi's , fastq, sort, dedup, and zipper tools).

See the Available Tools section for a list of current tools.

Motivation

The obvious question is: why not just fix up Picard? Riker exists for a number of reasons:

Speed: Most tools in riker are 4-6x faster than their counterparts in Picard; some are much faster. With a fresh start it was much easier to make all tools runnable via a single multi command, reducing complexity and saving more compute time (e.g. in Picard the wgs hybcap, and error tools all require separate invocations).
Cleaner Output: Picard's tools output an inconvenient mostly-TSV format with a variable number of #-commented header lines and sometimes multiple independent tables in the same file, making them harder than they should be to parse programmatically and annoying to review manually. Riker's outputs are simple TSVs with one table per output file that can easily be routed into cat file | column -t or python's csv.DictReader with no fuss. In addition Riker standardizes on having the sample be the first column in every file - so you can easily concatenate files for many samples.
Lightweight Distribution: Picard carries a lot of baggage. It needs a JVM. It needs R and much of the tidyverse in order to produce plots. The bioconda distribution requires a python interpreter to run it's wrapper script. Running pixi init && pixi add picard results in a 1.2GB environment! In contrast Riker is distributed as a single executable of < 10MB with no external dependencies.
Maintenance: Maintenance of Picard has been minimal for some time, with what little activity there is coming mostly from the community. Picard is owned by, but no longer actively led by, the Broad Institute, making its path forward unclear. Picard also suffers from some now-unnecessary complexity and years of less-than-necessary maintenance. All of this makes a fresh start more appealing.

Performance

Numbers below are from a reproducible benchmark pipeline run on 2026-05-06 on a single AWS r8id.xlarge instance (4 vCPU, 32 GB RAM, local NVMe), against publicly-available 1000 Genomes 30× WGS BAMs (transcoded from CRAM) and the GIAB Ashkenazi exome trio.

Tool versions:

Riker: 0.2.0 release candidate (built from source on the host)
Picard: 3.4.0 on OpenJDK 25.0.2 (bioconda)
mosdepth: 0.3.14 (bioconda)
samtools: 1.23.1 (bioconda; used for staging — CRAM→BAM transcoding, downsampling with view --subsample, indexing)

WGS — Riker wgs vs. Picard CollectWgsMetrics

Sample	BAM	Cov	Riker Wall	Picard Wall	Speedup	Riker RSS	Picard RSS
HG02675	7.1 GB	4×	0:56	12:24	13.2×	0.74 GB	1.56 GB
HG02675	21.7 GB	15×	2:56	32:20	11.0×	0.74 GB	5.97 GB
HG02675	28.1 GB	20×	3:47	40:24	10.7×	0.74 GB	5.58 GB
HG00188	37.5 GB	30×	5:13	1:02:30	12.0×	0.74 GB	5.24 GB
HG02675	41.1 GB	30×	5:32	1:01:20	11.1×	0.74 GB	5.20 GB

WGS — Riker multi vs. Picard Collect*Metrics

Sample	BAM	Cov	Riker Wall	Picard Wall (CMM + CWM)	Speedup	Riker RSS	Picard peak RSS
HG02675	7.1 GB	4×	1:25	21:16 (8:44 + 12:32)	15.1×	1.44 GB	1.60 GB
HG02675	21.7 GB	15×	4:34	58:38 (26:32 + 32:06)	12.8×	1.47 GB	5.97 GB
HG02675	28.1 GB	20×	5:58	1:15:12 (34:21 + 40:51)	12.6×	1.48 GB	5.58 GB
HG00188	37.5 GB	30×	8:14	1:46:16 (46:36 + 59:40)	12.9×	1.48 GB	5.24 GB
HG02675	41.1 GB	30×	8:45	1:51:55 (51:13 + 60:42)	12.8×	1.47 GB	5.20 GB

Riker was tested with a single invocation of riker multi --tools wgs gcbias alignment basic isize --threads 4. Picard was run twice, once for CollectWgsMetrics and once for CollectMultipleMetrics to generate a matching set of outputs.

"Picard peak RSS" is the larger of the two sequential JVM runs — typically dominated by CollectWgsMetrics, which scales with genome size + coverage.

WGS — Riker wgs vs. mosdepth

Sample	BAM	Cov	Riker Wall	mosdepth Wall	Δ	Riker RSS	mosdepth RSS
HG02675_4x	7.1 GB	4×	0:56	1:13	Riker 23 % faster	0.74 GB	3.07 GB
HG02675_15x	21.7 GB	15×	2:56	2:29	mosdepth 15 % faster	0.74 GB	2.42 GB
HG02675_20x	28.1 GB	20×	3:47	2:59	mosdepth 21 % faster	0.74 GB	2.48 GB
HG00188_30x	37.5 GB	30×	5:13	3:48	mosdepth 27 % faster	0.74 GB	2.52 GB
HG02675_30x	41.1 GB	30×	5:32	4:05	mosdepth 26 % faster	0.74 GB	2.59 GB

Both tools running pure single-thread. mosdepth was run with its default -t 0 for zero extra decompression threads, and with --no-per-base.

Surprisingly Riker outperforms mosdepth on the low-coverage (4×) sample, while mosdepth wins at higher depths. The 15-27% delta is largely explainable by the fact that Riker is performing per-base quality score filtering, and quality-score aware mate-overlap computations whereas mosdepth does not examine quality scores.

Hybcap — Riker hybcap vs. Picard CollectHsMetrics

Hybcap measurements are the mean of three GIAB Ashkenazi trio samples (HG002, HG003, HG004), each a ~9.8 GB exome BAM aligned to hs37d5 with the Agilent SureSelect Human All Exon V5 capture kit.

Sample	BAM	Kit	Riker Wall	Picard Wall	Speedup	Riker RSS	Picard RSS
AJ trio (mean)	~9.8 GB	Agilent v5	1:26	14:12	9.9×	0.98 GB	2.45 GB

Hybcap — Riker multi vs. Picard Collect*Metrics

Sample	BAM	Kit	Riker Wall	Picard Wall (CMM + CHsM)	Speedup	Riker RSS	Picard peak RSS
AJ trio (mean)	~9.8 GB	Agilent v5	1:45	22:07 (7:57 + 14:09)	12.6×	0.99 GB	3.23 GB

Riker was tested with a single invocation of riker multi --tools hybcap alignment basic isize --threads 4. Picard was run twice, once for CollectHsMetrics and once for CollectMultipleMetrics to generate a matching set of outputs.

"Picard peak RSS" is the larger of the two sequential JVM runs — dominated by CollectHsMetrics on the hybcap trio.

Installation

Install from bioconda

Using pixi, after adding the bioconda channel to your configuration:

pixi add riker

Or using your favorite conda client:

conda install -c bioconda riker

Installing with Cargo

Riker can be installed from crates.io. If you're unfamiliar with cargo (the Rust build tool) but want to go this route, start by installing rustup.

cargo install riker-ngs

Building from source

Similarly, to build from source you'll also need cargo. Once installed you can:

# clone the repo
git clone https://github.com/fulcrumgenomics/riker

# build the release version:
cd riker
cargo build --release

Available Tools

Command	Description	Equivalent Tool(s)
`alignment`	Collect alignment summary metrics	`picard CollectAlignmentSummaryMetrics`
`basic`	Collect base distribution, mean quality, and quality score distribution	`picard CollectBaseDistributionByCycle`, `MeanQualityByCycle`, `QualityScoreDistribution`
`error`	Collect base-level error metrics (mismatch, overlap, indel)	`picard CollectSamErrorMetrics`
`isize`	Collect insert size distribution metrics	`picard CollectInsertSizeMetrics`
`wgs`	Collect whole-genome coverage metrics	`picard CollectWgsMetrics`
`gcbias`	Collect GC bias metrics	`picard CollectGcBiasMetrics`
`hybcap`	Collect hybrid capture (bait/target) metrics	`picard CollectHsMetrics`
`multi`	Run multiple collectors in a single BAM pass	`picard CollectMultipleMetrics`
`docs`	Print metric field documentation	--

Usage

For detailed usage of each command, run:

riker <command> --help

Examples

Collect basic QC metrics (base distribution, mean quality, quality score distribution):

riker basic -i sample.bam -o out_prefix

Collect alignment summary metrics with a reference:

riker alignment -i sample.bam -r ref.fa -o out_prefix

Collect base-level error metrics with stratification:

riker error -i sample.bam -r ref.fa -o out_prefix
riker error -i sample.bam -r ref.fa -o out_prefix --vcf known.vcf.gz --stratify-by read_num,cycle bq

Collect insert size metrics:

riker isize -i sample.bam -o out_prefix

Collect whole-genome coverage metrics:

riker wgs -i sample.bam -r ref.fa -o out_prefix
riker wgs -i sample.bam -r ref.fa -o out_prefix -L intervals.bed

Collect GC bias metrics:

riker gcbias -i sample.bam -r ref.fa -o out_prefix

Collect hybrid capture metrics:

riker hybcap -i sample.bam -o out_prefix --baits baits.bed --targets targets.bed
riker hybcap -i sample.bam -o out_prefix --baits baits.bed --targets targets.bed -r ref.fa

Run multiple collectors in a single pass:

riker multi \
  -i sample.bam \
  -r ref.fa \
  -o out_prefix \
  --tools alignment isize basic hypcap \
  --hybcap::baits baits.bed \
  --hybcap::targets targets.bed

Output Format

Riker produces PDF plots using kuva, and plain TSV output designed for easy downstream consumption:

Lowercase snake_case headers (e.g., total_reads, mean_insert_size)
Tab-separated with no metadata or comment lines
Fractions use frac_ prefix instead of pct_ (e.g., frac_aligned not pct_aligned)
No per read-group or library breakdown -- all reads in the file are combined

To see detailed documentation on the columns output in each file type run riker docs.

Key Differences vs. Picard

Riker aims to produce metrics identical to Picard's equivalent tools. This section documents where there are known and expected functional differences in Riker vs. Picard.

Differences in alignment vs. CollectAlignmentSummaryMetrics

`mean_aligned_read_length`

Picard computes mean aligned read length over all PF reads, including unmapped reads which contribute zero to the sum. The denominator is PF_READS, so unmapped reads dilute the average toward zero.

riker uses aligned_reads as the denominator, giving the mean length of reads that actually aligned.

Impact: riker produces slightly higher values than Picard — typically ~0-2bp for WGS data with high alignment rates. The difference grows with the fraction of unmapped reads.

`reads_improperly_paired` / `frac_reads_improperly_paired`

Picard counts all mapped, paired, non-proper reads as improperly paired, including reads whose mate is unmapped (no is_mate_unmapped() guard).

riker requires the mate to also be mapped before counting a read toward aligned_reads_in_pairs or reads_improperly_paired. This avoids inflating the improper-pair count with reads whose mate simply failed to align.

Impact: riker reports fewer improperly paired reads than Picard. On typical WGS data the difference is usually small, in the single digit percent range. aligned_reads_in_pairs and frac_reads_improperly_paired are also affected.

Differences in hybcap vs. CollectHsMetrics.

`frac_exc_overlap` (Picard: `PCT_EXC_OVERLAP`)

riker reports a slightly lower frac_exc_overlap than Picard — typically below 1% relative difference (e.g. 0.032542 vs 0.032646 on a 1000G exome dataset).

Cause: Picard's overlap-clipping function (SAMUtils.getNumOverlappingAlignedBasesToClip() in htsjdk) does not verify that the read and its mate are mapped to the same contig before computing the overlap. It compares getMateAlignmentStart() against getAlignmentStart() as raw integers regardless of reference sequence. For chimeric read pairs — where one end maps to a different chromosome — Picard may compute a spurious overlap when the mate's coordinate on a different contig happens to be within the range of the reads aligned coordinates. See broadinstitute/picard#2039.

riker's equivalent function checks reference_sequence_id != mate_reference_sequence_id and returns 0 for chimeric pairs, correctly skipping overlap clipping when the reads are on different contigs.

Impact: frac_exc_overlap and frac_exc_off_target are affected; the latter because bases now correctly not categorized as overlapping are usually excluded subsequently for being off-target. The magnitude depends on the fraction of chimeric read pairs in the data, but is usually very small.

Per-target GC content and GC dropout

riker and Picard compute per-target GC fraction differently when a target's reference sequence contains ambiguous (N) bases.

Picard counts N bases in the denominator but not the numerator: gc_fraction = (G + C) / (A + T + G + C + N). This dilutes the GC fraction toward zero in proportion to the number of N bases in the target.

riker treats N bases as maximally uncertain, contributing 0.5 to the GC numerator and 1.0 to the denominator: gc_fraction = (G + C + 0.5 * N) / (A + T + G + C + N). A target that is entirely N bases reports a GC fraction of 0.5 (maximum uncertainty) rather than 0.0.

Neither approach is fully correct — the true GC content of N bases is unknown — but riker's treatment avoids the bias toward low GC that Picard introduces. In practice the difference is only visible for targets where N bases make up a meaningful fraction of the reference sequence.

Impact: Per-target GC values can differ for N-containing targets, which in turn affects the GC bias curve and the at_dropout and gc_dropout summary metrics. All other per-target and summary metrics are unaffected.

Differences in error vs. CollectSamErrorMetrics

Reference N bases

Picard counts bases at reference-N positions toward TOTAL_BASES (and may count them as errors since the read base does not match N).

riker skips positions where the reference base is N, since errors cannot be meaningfully assessed without a known reference base.

Impact: riker reports slightly fewer total_bases and error_bases than Picard. The difference is proportional to the number of reference-N positions covered by reads — typically negligible for well-assembled references.

Mismatch `total_bases` includes insertion bases in Picard

Picard's mismatch metric (ERROR) includes insertion bases in TOTAL_BASES. This happens because BaseErrorCalculator.addBase() increments nBases for both aligned (Match) bases and insertion bases, and SimpleErrorCalculator inherits this count as the denominator.

riker counts only aligned bases in the mismatch total_bases. Insertion bases are counted separately in the indel metric.

Impact: Picard's mismatch TOTAL_BASES is higher than riker's by the number of inserted bases passing filters. The mismatch error_bases (numerator) is identical between the two tools — only the denominator differs. riker's separation is arguably cleaner since insertion bases are not relevant to the substitution error rate.

Insert size stratification

Picard caps insert size at readLength * 10 for the INSERT_LENGTH stratifier. Reads with larger absolute insert sizes (e.g., chimeric pairs) are binned at the cap value.

riker excludes reads with absolute insert size above --max-isize (default 1000) from the isize stratifier entirely. These reads are still counted by all other stratifiers and in the all group. The threshold can be adjusted via --max-isize.

Impact: Picard may have a large bin at the cap value containing chimeric reads, while riker omits them. Overall error rates are unaffected.

Insertions at read start

Picard counts insertions that occur before any aligned base in the read (e.g., CIGARs starting with nI or nSnI). The locus iterator attaches these to the preceding reference position.

riker skips insertions that have no preceding aligned base (no anchor position), since there is no reference context for stratification.

Impact: Picard reports slightly more insertions and inserted bases than riker. The difference is small — typically a few hundred events out of tens of thousands. Deletion counts are unaffected and match exactly between the tools.

Q-score computation

Picard computes Q-scores using a Bayesian prior: Q = -10 * log10((errors + 0.001) / (total + 1)), rounded to the nearest integer. The prior (configurable via PRIOR_Q, default 30) prevents infinite Q-scores when there are zero errors.

riker computes Q-scores from the raw error rate: Q = -10 * log10(errors / total), reported to two decimal places. When there are zero errors, a cap of Q99 is used.

Impact: Q-scores differ slightly due to the prior and rounding. The underlying counts (numerator and denominator) are comparable; only the derived Q-score differs.

Stratifiers not ported from Picard

Picard's CollectSamErrorMetrics defines 32 stratifiers. riker ports 15 of them (all, bq, mapq, cycle, read_num, strand, pair_orientation, isize, gc, read_base, ref_base, hp_len, pre_dinuc, post_dinuc, context_3bp). The following Picard stratifiers are not available in riker:

PAIR_PROPERNESS — whether the read is in a proper pair
HOMOPOLYMER — the homopolymer base (A/C/G/T) at the current position
BINNED_HOMOPOLYMER — homopolymer length bucketed into bins
BINNED_CYCLE — machine cycle bucketed into bins
SOFT_CLIPS — number of soft-clipped bases in the read
MISMATCHES_IN_READ — total mismatches in the read (nm is similar)
TWO_BASE_PADDED_CONTEXT — 5-base context (2bp each side) (Context3bp is provided)
CONSENSUS — whether the read is a consensus/duplex read
NS_IN_READ — number of N bases in the read
INSERTIONS_IN_READ — number of insertion events in the read
DELETIONS_IN_READ — number of deletion events in the read
INDELS_IN_READ — number of indel events in the read
FLOWCELL_TILE — flowcell tile from the read name
FLOWCELL_X — flowcell X coordinate from the read name
FLOWCELL_Y — flowcell Y coordinate from the read name
READ_GROUP — the read group of the read

Authors

Tim Fennell

Disclaimer

This software is under active development. While we make a best effort to test this software and to fix issues as they are reported, this software is provided as-is without any warranty (see the license for details). Please submit an issue, and better yet a pull request as well, if you discover a bug or identify a missing feature. Please contact Fulcrum Genomics if you are considering using this software or are interested in sponsoring its development.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.cargo		.cargo
.config		.config
.github		.github
benchmark-pipeline		benchmark-pipeline
docs		docs
riker_derive		riker_derive
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
release.toml		release.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Folders and files

Latest commit

History

Repository files navigation

riker

Overview

Motivation

Performance

WGS — Riker wgs vs. Picard CollectWgsMetrics

WGS — Riker multi vs. Picard Collect*Metrics

WGS — Riker wgs vs. mosdepth

Hybcap — Riker hybcap vs. Picard CollectHsMetrics

Hybcap — Riker multi vs. Picard Collect*Metrics

Installation

Install from bioconda

Installing with Cargo

Building from source

Available Tools

Usage

Examples

Output Format

Key Differences vs. Picard

Differences in alignment vs. CollectAlignmentSummaryMetrics

mean_aligned_read_length

reads_improperly_paired / frac_reads_improperly_paired

Differences in hybcap vs. CollectHsMetrics.

frac_exc_overlap (Picard: PCT_EXC_OVERLAP)

Per-target GC content and GC dropout

Differences in error vs. CollectSamErrorMetrics

Reference N bases

Mismatch total_bases includes insertion bases in Picard

Insert size stratification

Insertions at read start

Q-score computation

Stratifiers not ported from Picard

Authors

Sponsors

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mean_aligned_read_length`

`reads_improperly_paired` / `frac_reads_improperly_paired`

`frac_exc_overlap` (Picard: `PCT_EXC_OVERLAP`)

Mismatch `total_bases` includes insertion bases in Picard

Packages