HeavySkill

Heavy Thinking as the Inner Skill in Agentic Harness

HeavySkill is a test-time scaling technique that decomposes complex reasoning into two stages:

Parallel Reasoning — Generate K independent reasoning trajectories concurrently
Sequential Deliberation — Synthesize trajectories through critical analysis into a superior final answer

This repository provides two modes of use:

Mode	Description	Use Case
Workflow	Python async pipeline with CLI	Batch evaluation, research experiments, custom deployments
Skill	Pure prompt file for Claude Code / agentic harness	Interactive reasoning in AI-native IDEs

Key Results

Heavy thinking consistently outperforms Best-of-N (majority voting) strategies
Stronger LLMs can approach Pass@N performance through deliberation
The depth (iterations) and width (K) of heavy thinking are scalable via RLVR

Installation

git clone https://github.com/wjn1996/HeavySkill.git
cd HeavySkill
pip install -e .

Quick Start

Mode 1: Workflow (Python Pipeline)

python scripts/run_heavyskill.py \
    --query "Find the number of paths of length 16 on an 8x8 grid that change direction exactly four times." \
    --model "deepseek-r1" \
    --api_base "http://localhost:8080" \
    --reason_k 8 \
    --summary_k 4 \
    --prompt_type "stem" \
    --output "outputs/result.json" \
    --verbose

Parameters:

--reason_k: Number of parallel reasoning trajectories (default: 8)
--summary_k: Number of deliberation samples (default: 4)
--iterations: Iterative deliberation rounds (default: 1)
--prompt_type: "general" or "stem"
--language: "en" or "cn"

Using a separate deliberation model:

python scripts/run_heavyskill.py \
    --query "Your problem here" \
    --model "r1-distill-qwen-7b" \
    --api_base "http://localhost:8080" \
    --summary_model "qwen3-32b" \
    --summary_api_base "http://localhost:8081" \
    --reason_k 16 \
    --summary_k 4

Batch mode:

python scripts/run_heavyskill.py \
    --input_file "examples/example_math.json" \
    --model "deepseek-r1" \
    --api_base "http://localhost:8080" \
    --output "outputs/batch_result.json"

Mode 2: Skill (Claude Code / Agentic Harness)

Copy the skill file into your Claude Code skills directory:

cp skill/heavyskill.md ~/.claude/skills/heavyskill.md

Then in Claude Code, the heavy thinking protocol will be available for complex reasoning tasks. The skill instructs the model to:

Spawn multiple independent reasoning agents in parallel
Collect diverse reasoning trajectories
Perform critical meta-analysis and deliberation
Output the synthesized final answer

Project Structure

HeavySkill/
├── workflow/                    # Mode 1: Python async pipeline
│   ├── config.py               # Configuration dataclass
│   ├── parallel_reasoning.py   # Stage 1: Parallel trajectory generation
│   ├── sequential_deliberation.py  # Stage 2: Synthesis & deliberation
│   ├── memory_cache.py         # Trajectory storage & selection
│   ├── prompts.py              # Prompt templates (general, STEM, CN/EN)
│   ├── pipeline.py             # Full pipeline orchestration
│   ├── utils.py                # Utilities (clipping, extraction, etc.)
│   └── agent/
│       ├── base.py             # Abstract agent interface
│       └── openai_compatible.py # OpenAI-compatible async API client
├── scripts/
│   ├── run_heavyskill.py       # CLI entry point
│   ├── run_heavyskill.sh       # Example shell script
│   └── evaluate.py             # Simple accuracy evaluation
├── skill/
│   └── heavyskill.md           # Pure prompt skill for agentic harness
├── examples/
│   └── example_math.json       # Example input data
├── paper/
│   └── heavyskill.pdf          # Paper
├── requirements.txt
└── pyproject.toml

How It Works

┌─────────────────────────────────────────────────────────┐
│                      User Query                          │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│            Stage 1: Parallel Reasoning                   │
│                                                         │
│   ┌──────────┐ ┌──────────┐ ┌──────────┐    ┌──────┐  │
│   │ Thinker 1│ │ Thinker 2│ │ Thinker 3│ ...│  K   │  │
│   └────┬─────┘ └────┬─────┘ └────┬─────┘    └──┬───┘  │
│        │             │             │             │      │
└────────┼─────────────┼─────────────┼─────────────┼──────┘
         │             │             │             │
         ▼             ▼             ▼             ▼
┌─────────────────────────────────────────────────────────┐
│                    Memory Cache                          │
│         (Store & organize K trajectories)                │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│          Stage 2: Sequential Deliberation                │
│                                                         │
│   - Analyze answer distribution across trajectories      │
│   - Cross-validate reasoning chains                      │
│   - Identify logical errors & correct approaches         │
│   - Synthesize final answer with critical thinking       │
│                                                         │
│              ┌─── Iterative Update (optional) ◄──┐      │
│              └───────────────────────────────────┘      │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│                    Final Answer                          │
└─────────────────────────────────────────────────────────┘

API Compatibility

The workflow supports any OpenAI-compatible API endpoint:

vLLM serving (--api_base http://localhost:8000)
DeepSeek API (--api_base https://api.deepseek.com)
Together AI (--api_base https://api.together.xyz)
OpenRouter (--api_base https://openrouter.ai/api)
Local Ollama (--api_base http://localhost:11434)

Citation

@article{wang2026heavyskill,
  title={HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness},
  author={Wang, Jianing and Guo, Linsen and Chen, Zhengyu and Guo, Qi and Zang, Hongyu and Shi, Wenjie and Ma, Haoxiang and Xi, Xiangyu and Li, Xiaoyu and Wang, Wei and Cai, Xunliang},
  journal={arXiv preprint arXiv:2605.02396},
  year={2026},
  url={https://arxiv.org/abs/2605.02396}
}

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HeavySkill

Key Results

Installation

Quick Start

Mode 1: Workflow (Python Pipeline)

Mode 2: Skill (Claude Code / Agentic Harness)

Project Structure

How It Works

API Compatibility

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
paper		paper
scripts		scripts
skill		skill
workflow		workflow
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HeavySkill

Key Results

Installation

Quick Start

Mode 1: Workflow (Python Pipeline)

Mode 2: Skill (Claude Code / Agentic Harness)

Project Structure

How It Works

API Compatibility

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages