Skip to content

Commit 1bdf4d2

Browse files
authored
[hardware, recipe, ci] feat: Support fsdp peft sft on npu (verl-project#2240)
### What does this PR do? - Support fsdp peft sft on npu. - Add CI actions to maintain peft sft and sequence parallelism function on npu. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Run examples/sft/gsm8k/run_qwen_05_peft_sp2_npu.sh on gpu and npu: ```xshell torchrun --standalone --nnodes=1 --nproc_per_node=8 \ -m verl.trainer.fsdp_sft_trainer \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.prompt_key=extra_info \ data.response_key=extra_info \ optim.lr=1e-4 \ data.prompt_dict_keys=['question'] \ +data.response_dict_keys=['answer'] \ data.micro_batch_size_per_gpu=64 \ model.partial_pretrain=Qwen/Qwen2.5-0.5B-Instruct \ trainer.default_local_dir=$save_path \ trainer.project_name=gsm8k-sft \ trainer.experiment_name=gsm8k-sft-qwen-2.5-0.5b-instruct \ trainer.logger=['console'] \ trainer.total_epochs=2 \ trainer.default_hdfs_dir=null $@ \ model.lora_rank=32 \ model.lora_alpha=16 \ model.target_modules=all-linear \ model.strategy=fsdp \ ulysses_sequence_parallel_size=2 \ use_remove_padding=true ``` Mean absolute error of train loss: ![train_loss_mae](https://github.com/user-attachments/assets/f0c436ae-4d92-44c9-bca8-0b7cde1f4cfe) ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Enable sp: ```xhell --ulysses_sequence_parallel_size=2 --use_remove_padding=true ``` NPU does not support sdpa2, so we need to set model.strategy: ``` --model.strategy=sdpa ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
1 parent 82d1ef5 commit 1bdf4d2

4 files changed

Lines changed: 72 additions & 5 deletions

File tree

.github/workflows/e2e_ascend.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,10 +120,10 @@ jobs:
120120
run: |
121121
ray stop --force
122122
python3 examples/data_preprocess/geo3k.py
123-
- name: Running gsm8k e2e training tests with LoRA on ASCEND NPU
123+
- name: Running gsm8k e2e training tests with peft sft on ASCEND NPU
124124
run: |
125125
ray stop --force
126-
bash tests/special_e2e/sft/run_sft.sh
126+
bash tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh
127127
rm -rf $HOME/ckpts
128128
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU
129129
run: |

docs/ascend_tutorial/ascend_quick_start.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Last updated: 06/17/2025.
1010

1111
Atlas 200T A2 Box16
1212

13-
Atlas 800T A2
13+
Atlas 900 A2 PODc
1414

1515

1616
安装
@@ -47,7 +47,7 @@ vllm & vllm-ascend
4747
# for Atlas 200T A2 Box16
4848
VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
4949
50-
# for Atlas 800T A2
50+
# for Atlas 900 A2 PODc
5151
VLLM_TARGET_DEVICE=empty pip install -e .
5252
5353
.. code-block:: bash
@@ -172,7 +172,8 @@ vllm & vllm-ascend
172172
+-----------+-------------------------+-------------+-------------------+----------------------+
173173
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | Atlas 200T A2 Box16 |
174174
+-----------+-------------------------+-------------+-------------------+----------------------+
175-
175+
| SFT-PEFT | Qwen2.5-0.5B-instruct | 0.06% | 0.305 | Atlas 900 A2 PODc |
176+
+-----------+-------------------------+-------------+-------------------+----------------------+
176177

177178
精度对比说明
178179
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
set -x
2+
3+
if [ "$#" -lt 2 ]; then
4+
echo "Usage: run_qwen2_5_05b_sft_peft_sp2_npu.sh <nproc_per_node> <save_path> [other_configs...]"
5+
exit 1
6+
fi
7+
8+
nproc_per_node=$1
9+
save_path=$2
10+
11+
# Shift the arguments so $@ refers to the rest
12+
shift 2
13+
14+
torchrun --standalone --nnodes=1 --nproc_per_node=$nproc_per_node \
15+
-m verl.trainer.fsdp_sft_trainer \
16+
data.train_files=$HOME/data/gsm8k/train.parquet \
17+
data.val_files=$HOME/data/gsm8k/test.parquet \
18+
data.prompt_key=extra_info \
19+
data.response_key=extra_info \
20+
optim.lr=1e-4 \
21+
data.prompt_dict_keys=['question'] \
22+
+data.response_dict_keys=['answer'] \
23+
data.micro_batch_size_per_gpu=64 \
24+
model.partial_pretrain=Qwen/Qwen2.5-0.5B-Instruct \
25+
trainer.default_local_dir=$save_path \
26+
trainer.project_name=gsm8k-sft \
27+
trainer.experiment_name=gsm8k-sft-qwen-2.5-0.5b-instruct \
28+
trainer.logger=['console'] \
29+
trainer.total_epochs=2 \
30+
trainer.default_hdfs_dir=null $@ \
31+
model.lora_rank=32 \
32+
model.lora_alpha=16 \
33+
model.target_modules=all-linear \
34+
model.strategy=fsdp \
35+
ulysses_sequence_parallel_size=2 \
36+
use_remove_padding=true
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
set -x
2+
3+
mkdir -p ./save_ckpts
4+
5+
torchrun --standalone --nnodes=1 --nproc_per_node=8 \
6+
-m verl.trainer.fsdp_sft_trainer \
7+
data.train_files=$HOME/data/gsm8k/train.parquet \
8+
data.val_files=$HOME/data/gsm8k/test.parquet \
9+
data.prompt_key=extra_info \
10+
data.response_key=extra_info \
11+
optim.lr=1e-4 \
12+
data.prompt_dict_keys=['question'] \
13+
+data.response_dict_keys=['answer'] \
14+
data.micro_batch_size_per_gpu=32 \
15+
model.partial_pretrain=Qwen/Qwen2.5-0.5B-Instruct \
16+
trainer.default_local_dir=./save_ckpts \
17+
trainer.project_name=gsm8k-sft \
18+
trainer.experiment_name=gsm8k-sft-qwen-2.5-0.5b-instruct \
19+
trainer.logger=['console'] \
20+
trainer.total_epochs=1 \
21+
trainer.total_training_steps=1 \
22+
trainer.default_hdfs_dir=null $@ \
23+
model.lora_rank=32 \
24+
model.lora_alpha=16 \
25+
model.target_modules=all-linear \
26+
model.strategy=fsdp \
27+
ulysses_sequence_parallel_size=2 \
28+
use_remove_padding=true
29+
30+
rm -rf ./outputs ./save_ckpts

0 commit comments

Comments
 (0)