MWC

Mask wearing classifier.

output_.mp4

Variant	Size	F1	CPU inference latency	ONNX
P	115 KB	0.9981	0.23 ms	Download
N	176 KB	0.9995	0.41 ms	Download
T	280 KB	0.9996	0.52 ms	Download
S	495 KB	0.9998	0.64 ms	Download
L	6.4 MB	0.9998	1.03 ms	Download

Setup

git clone https://github.com/PINTO0309/MWC.git && cd MWC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_mwc.py \
-hm mwc_l_48x48.onnx \
-v 0 \
-ep cuda \
-dlr -dnm -dgm -dhm -dhd

uv run python demo_mwc.py \
-hm mwc_l_48x48.onnx \
-v 0 \
-ep tensorrt \
-dlr -dnm -dgm -dhm -dhd

Archive extraction

Extract images from the source archive into numbered folders under data/, storing up to 2,000 images per folder:

python 00_extract_tar.py \
--archive /path/to/train_aug_120x120_part_masked_clean.tar.gz \
--output-dir data \
--images-per-dir 2000

Dataset parquet

Generate a parquet dataset with embedded resized image bytes:

SIZE=48x48 # HxW
python 01_build_mask_parquet.py \
--root data \
--output data/dataset_${SIZE}.parquet \
--image-size ${SIZE}

Labels are derived from filenames:

*_mask_* -> masked / 1
otherwise -> no_masked / 0

Data sample

1	2	3	4	5

Training Pipeline

The training loop relies on BCEWithLogitsLoss plus class-balanced pos_weight to stabilise optimisation under class imbalance; inference produces sigmoid probabilities. Use --train_resampling weighted to switch on the previous WeightedRandomSampler behaviour, or --train_resampling balanced to physically duplicate minority classes before shuffling.
Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.
Per-epoch checkpoints named like mwc_epoch_0001.pt are retained (latest 10), as well as the best checkpoints named mwc_best_epoch0004_f1_0.9321.pt (also latest 10).

The backbone can be switched with --arch_variant. Supported combinations with --head_variant are:

`--arch_variant`	Default (`--head_variant auto`)	Explicitly selectable heads	Remarks
`baseline`	`avg`	`avg`, `avgmax_mlp`	When using `transformer`/`mlp_mixer`, you need to adjust the height and width of the feature map so that they are divisible by `--token_mixer_grid` (if left as is, an exception will occur during ONNX conversion or inference).
`inverted_se`	`avgmax_mlp`	`avg`, `avgmax_mlp`	When using `transformer`/`mlp_mixer`, it is necessary to adjust `--token_mixer_grid` as above.
`convnext`	`transformer`	`avg`, `avgmax_mlp`, `transformer`, `mlp_mixer`	For both heads, the grid must be divisible by the feature map (default `3x2` fits with 30x48 input).

The classification head is selected with --head_variant (avg, avgmax_mlp, transformer, mlp_mixer, or auto which derives a sensible default from the backbone).
Pass --rgb_to_yuv_to_y to convert RGB crops to YUV, keep only the Y (luma) channel inside the network, and train a single-channel stem without modifying the dataloader.
Alternatively, use --rgb_to_lab or --rgb_to_luv to convert inputs to CIE Lab/Luv (3-channel) before the stem; these options are mutually exclusive with each other and with --rgb_to_yuv_to_y.
Mixed precision can be enabled with --use_amp when CUDA is available.
Resume training with --resume path/to/mwc_epoch_XXXX.pt; all optimiser/scheduler/AMP states and history are restored.
Loss/accuracy/F1 metrics are logged to TensorBoard under output_dir, and tqdm progress bars expose per-epoch progress for train/val/test loops.

Baseline depthwise-separable CNN:

SIZE=48x48
uv run python -m mwc train \
--data_root data/dataset.parquet \
--output_dir runs/mwc_${SIZE} \
--epochs 40 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant baseline \
--seed 42 \
--device auto \
--use_amp

Inverted residual + SE variant (recommended for higher capacity):

SIZE=48x48
VAR=s
uv run python -m mwc train \
--data_root data/dataset.parquet \
--output_dir runs/mwc_is_${VAR}_${SIZE} \
--epochs 40 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant inverted_se \
--head_variant avgmax_mlp \
--seed 42 \
--device auto \
--use_amp

ConvNeXt-style backbone with transformer head over pooled tokens:

SIZE=48x48
uv run python -m mwc train \
--data_root data/dataset.parquet \
--output_dir runs/mwc_convnext_${SIZE} \
--epochs 40 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant convnext \
--head_variant transformer \
--token_mixer_grid 3x3 \
--seed 42 \
--device auto \
--use_amp

Outputs include the latest 10 mwc_epoch_*.pt, the latest 10 mwc_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
After every epoch a confusion matrix and ROC curve are saved under runs/mwc/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
--image_size accepts either a single integer for square crops (e.g. --image_size 48) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
Launch TensorBoard with:
```
tensorboard --logdir runs/mwc
```

ONNX Export

uv run python -m mwc exportonnx \
--checkpoint runs/mwc_is_s_48x48/mwc_best_epoch0049_f1_0.9939.pt \
--output mwc_s_48x48.onnx \
--opset 17

Arch

Ultra-lightweight classification model series

VSDLM: Visual-only speech detection driven by lip movements - MIT License
OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
PGC: Ultrafast pointing gesture classification - MIT License
SC: Ultrafast sitting classification - MIT License
PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
HSC: Happy smile classifier - MIT License
WHC: Waving Hand Classification - MIT License
UHD: Ultra-lightweight human detection - MIT License
MWC: Mask wearing classifier. - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2026mwc,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/MWC},
  month     = {04},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19617672},
  url       = {https://github.com/PINTO0309/mwc},
  abstract  = {Mask wearing classifier.},
}

Acknowledgments

https://github.com/cleardusk/3DDFA: MIT License

@misc{3ddfa_cleardusk,
  author =       {Guo, Jianzhu and Zhu, Xiangyu and Lei, Zhen},
  title =        {3DDFA},
  howpublished = {\url{https://github.com/cleardusk/3DDFA}},
  year =         {2018}
}

@inproceedings{guo2020towards,
  title=        {Towards Fast, Accurate and Stable 3D Dense Face Alignment},
  author=       {Guo, Jianzhu and Zhu, Xiangyu and Yang, Yang and Yang, Fan and Lei, Zhen and Li, Stan Z},
  booktitle=    {Proceedings of the European Conference on Computer Vision (ECCV)},
  year=         {2020}
}

@article{zhu2017face,
  title=      {Face alignment in full pose range: A 3d total solution},
  author=     {Zhu, Xiangyu and Liu, Xiaoming and Lei, Zhen and Li, Stan Z},
  journal=    {IEEE transactions on pattern analysis and machine intelligence},
  year=       {2017},
  publisher=  {IEEE}
}

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License

@software{DEIMv2-Wholebody34,
  author={Katsuya Hyodo},
  title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
  url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
  year={2025},
  month={10},
  doi={10.5281/zenodo.17625710}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
mwc		mwc
.gitignore		.gitignore
00_extract_tar.py		00_extract_tar.py
01_build_mask_parquet.py		01_build_mask_parquet.py
02_data_prep_realdata.py		02_data_prep_realdata.py
03_plot_dataset_pie.py		03_plot_dataset_pie.py
LICENSE		LICENSE
README.md		README.md
demo_mwc.py		demo_mwc.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MWC

Setup

Inference

Archive extraction

Dataset parquet

Data sample

Training Pipeline

ONNX Export

Arch

Ultra-lightweight classification model series

Citation

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MWC

Setup

Inference

Archive extraction

Dataset parquet

Data sample

Training Pipeline

ONNX Export

Arch

Ultra-lightweight classification model series

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages