PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition", CVPR 2020 Oral.
[PDF][Demo][Abstract/Supp]
- Python >= 3.6
- PyTorch >= 1.2.0
- NVIDIA Apex (auto mixed precision training)
- PyYAML, tqdm, tensorboardX
Disk usage warning: after preprocessing, the total sizes of datasets are around 38GB, 77GB, 63GB for NTU RGB+D 60, NTU RGB+D 120, and Kinetics 400, respectively. The raw/intermediate sizes may be larger.
There are 3 datasets to download:
- NTU RGB+D 60 Skeleton
- NTU RGB+D 120 Skeleton
- Kinetics 400 Skeleton
Request dataset here: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp
Download the skeleton-only datasets:
nturgbd_skeletons_s001_to_s017.zip(NTU RGB+D 60)nturgbd_skeletons_s018_to_s032.zip(NTU RGB+D 120, on top of NTU RGB+D 60)- Total size should be 5.8GB + 4.5GB.
Download missing skeletons lookup files from the authors' GitHub repo:
NTU RGB+D 60 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD_samples_with_missing_skeletons.txtNTU RGB+D 120 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD120_samples_with_missing_skeletons.txtRemember to remove the first few lines of text in these files!
- Download dataset from ST-GCN repo: https://github.com/yysijie/st-gcn/blob/master/OLD_README.md#kinetics-skeleton
- This might be useful if you want to
wgetthe dataset from Google Drive
Put downloaded data into the following directory structure:
- data/ - kinetics_raw/ - kinetics_train/ ... - kinetics_val/ ... - kinetics_train_label.json - keintics_val_label.json - nturgbd_raw/ - nturgb+d_skeletons/ # from `nturgbd_skeletons_s001_to_s017.zip` ... - nturgb+d_skeletons120/ # from `nturgbd_skeletons_s018_to_s032.zip` ... - NTU_RGBD_samples_with_missing_skeletons.txt - NTU_RGBD120_samples_with_missing_skeletons.txt NTU RGB+D
cd data_genpython3 ntu_gendata.pypython3 ntu120_gendata.py- Time estimate is ~ 3hrs to generate NTU 120 on a single core (feel free to parallelize the code :))
Kinetics
python3 kinetics_gendata.py- ~ 70 mins to generate Kinetics data
Generate the bone data with:
python gen_bone_data.py --dataset ntupython gen_bone_data.py --dataset ntu120python gen_bone_data.py --dataset kinetics
Download pretrained models for producing the final results on NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400: [Dropbox][GoogleDrive][WeiYun]
Put the folder of pretrained models at repo root:
- MS-G3D/ - pretrained-models/ - main.py - ... - Run
bash eval_pretrained.sh
- The general training template command:
python3 main.py --config <config file> --work-dir <place to keep things (weights, checkpoints, logs)> --device <GPU IDs to use> --half # Mixed precision training with NVIDIA Apex (default O1) for GPUs ~11GB memory [--base-lr <base learning rate>] [--batch-size <batch size>] [--weight-decay <weight decay>] [--forward-batch-size <batch size during forward pass, useful if using only 1 GPU>] [--eval-start <which epoch to start evaluating the model>] - The general testing template command:
python3 main.py --config <config file> --work-dir <place to keep things> --device <GPU IDs to use> --weights <path to model weights> [--test-batch-size <...>] - Template for joint-bone two-stream fusion:
python3 ensemble.py --dataset <dataset to ensemble, e.g. ntu120/xsub> --joint-dir <work_dir of your test command for joint model> --bone-dir <work_dir of your test command for bone model> Use the corresponding config files from
./configto train/test different datasetsExamples
- Train on NTU 120 XSub Joint
- Train with 1 GPU:
python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml
- Train with 2 GPUs:
python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml --batch-size 32 --forward-batch-size 32 --device 0 1
- Train with 1 GPU:
- Test on NTU 120 XSet Bone
python3 main.py --config ./config/nturgbd120-cross-setup/test_bone.yaml
- Batch size 32 on 1 GPU (BS 16 per forward pass by accumulating gradients):
python3 main.py --config <...> --batch-size 32 --forward-batch-size 16 --device 0
- Train on NTU 120 XSub Joint
Resume training from checkpoint
python3 main.py ... # Same params as before --start-epoch <0 indexed epoch> --weights <weights in work_dir> --checkpoint <checkpoint in work_dir> It's recommended to linearly scale up base LR with > 2 GPUs (https://arxiv.org/pdf/1706.02677.pdf, Section 2.1) to use 16 samples per worker during training; e.g.
- 1 GPU:
--base-lr 0.05 --device 0 --batch-size 32 --forward-batch-size 16 - 2 GPUs:
--base-lr 0.05 --device 0 1 --batch-size 32 --forward-batch-size 32 - 4 GPUs:
--base-lr 0.1 --device 0 1 2 3 --batch-size 64 --forward-batch-size 64
- 1 GPU:
Unfortunately, different PyTorch/CUDA versions & GPU setups can cause different levels of memory usage, and so you may experience out of memory (OOM) on some machines but not others
- 1080Ti GPUs with
--halfand--amp-opt-level 1(default) are relatively more stable
- 1080Ti GPUs with
If OOM occurs, try using Apex O2 by setting
--amp-opt-level 2. However, note that- NVIDIA Apex does not yet support
nn.DataParallelfor O2- NVIDIA/apex#227 (comment)
- This means you may need to train on a single GPU when using O2
- It may also impact the stability of training and/or the final performance
- NVIDIA Apex does not yet support
Default hyperparameters are stored in the config files; you can tune them & add extra training techniques to boost performance
The best joint-bone fusion result may not come from the best single stream models; for example, we provided 3 pretrained models for NTU RGB+D 60 XSub joint stream where the best fusion performance comes from the slightly underperforming model (~89.3%) instead of the reported (~89.4%) and the slightly better retrained model (~89.6%).
This repo is based on
Thanks to the original authors for their work!
Please cite this work if you find it useful:
@inproceedings{liu2020disentangling, title={Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition}, author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={143--152}, year={2020} } Please email kenziyuliu AT outlook.com for further questions







