Publications – CVML @ NUS

2024

Coherent Temporal Synthesis for Incremental Action Segmentation
[paper]
G. Ding, H. Golong & A. Yao
CVPR 2024.

Can I Trust Your Answer? Visually Grounded Video Question Answering
[paper][code]
J. Xiao, A. Yao, Y. Li & T. Chua
CVPR 2024.

Deep Imbalanced Regression via Hierarchical Classification Adjustment
[paper]
H. Xiong, A. Yao
CVPR 2024.

KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
[code]
F. Yang, K. Gu & Angela Yao
CVPR 2024.

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
[paper] [webpage] [code]
K. Xu, Z. Yu, X. Wang, M. Bi, A. Yao
CVPR 2024.

Scaling for training time and post-hoc out-of-distribution detection enhancement
[paper][code]
K. Xu, R. Chen, G. Franchi & A. Yao
ICLR 2024.

A closer look at branch classifiers of multi-exit architectures
[paper]
S. Lin, B. Ji, Rongrong Ji, & A. Yao.
Computer Vision and Image Understanding (CVIU) 2024.

Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers
[paper][code]
P. Sun, K. Gu, Y. Wang, L. Yang & A. Yao.
WACV 2024

Learning to generate training datasets for robust semantic segmentation
[paper]
M. Hariat, O. Laurent, R. Kazmierczak, S. Zhang, A. Bursuc, A. Yao & G. Franchi
WACV 2024

2023

Opening the Vocabulary of Egocentric Actions. D. Chatterjee, F. Sener, S. Ma, A. Yao. Neural Information Processing Systems (NeurIPS) 2023. [paper]
Syn-to-Real Pose Estimation through Geometric Reconstruction. Q. Lin, K. Gu, L. Yang, A. Yao. Neural Information Processing Systems (NeurIPS) 2023. [paper]
MHEntropy: Entropy Meets Multiple Hypotheses for Pose and Shape Recovery. R. Chen, L. Yang, A. Yao. International Conference on Computer Vision (ICCV) 2023. [paper]
HiFiHR: Enhancing 3D Hand Reconstruction from a Single Image via High-Fidelity Texture. J. Zhu, Z. Zhao, L. Yang, A. Yao. German Conference on Pattern Recognition (GCPR / DAGM) 2023. [paper]
C2F-TCN: A Framework for Semi- and Fully-Supervised Temporal Action Segmentation. D. Singhania, R. Rahaman and A. Yao. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). DOI: 10.1109/TPAMI.2023.3284080. [paper]
Contrastive Video Question Answering via Video Graph Transformer. J. Xiao, P. Zhou, A. Yao, Y. Li, R. Hong, S. Yan and T.-S. Chua. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). 10.1109/TPAMI.2023.3292266. [paper]
Temporal Action Segmentation: An Analysis of Modern Techniques. G. Ding, F. Sener, A. Yao. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). DOI: 10.1109/TPAMI.2023.3327284. [paper]
Bias-Compensated Integral Regression for Human Pose Estimation. K. Gu, L.Yang, M. Bi and A. Yao. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). DOI 10.1109/.3264742. [paper]
Cross-domain 3D hand pose estimation with dual modalities. Q. Lin, L. Yang and A. Yao. CVPR 2023. [paper]
Analyzing and Diagnosing Pose Estimation with Attributions. Q. He, L. Yang, K. Gu, Q. Lin and A. Yao. CVPR 2023. [paper][project][code]
Overcoming the Tradeoff in Accuracy and Plausibility for 3D Hand Shape Reconstruction. Z. Yu, L. Chen, L. Yang, X. Zheng, M. Bi, G. Lee, A. Yao
CVPR 2023. [paper]
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training. J. Chen, K. Xu, Y. Wang, Y. Cheng and A. Yao
ICLR 2023. [paper][code]
Improving Deep Regression with Ordinal Entropy. S. Zhang, L. Yang, M. Bi, X. Zheng and A. Yao. ICLR 2023. [paper][project][code]

2022

Transferring Knowledge from Text to Video: Zero-Shot Anticipation for Procedural Actions. F.Sener, R. Saraf and A. Yao. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). 2022. DOI: 10.1109/TPAMI.2022.3218596. [paper]
UV-Based 3D Hand-Object Reconstruction with Grasp Optimization. Z. Yu, L. Yang, Y. Xie, P. Chen and A. Yao. BMVC 2022. [paper]
Temporal Action Segmentation with High-level Complex Activity Labels. G. Ding and A. Yao. IEEETransactions on Multimedia (TMM). 2022. DOI: 10.1109/TMM.2022.3231099. [paper]
Leveraging Action Affinity and Continuity for Semi-Supervised Temporal Action Segmentation. G. Ding and A. Yao. ECCV 2022. [paper]
A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation. R. Rahaman, D. Singhania, A. Thiery, A. Yao. ECCV 2022. [paper]
Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution. Y. Zhang, B. Ji, J. Hao and A. Yao. ECCV 2022. [paper]
Discrete-Constrained Regression for Local Counting Models. H. Xiong and A. Yao. ECCV 2022. [paper]
Learning Deep Morphological Networks with Neural Architecture Search. Y. Hu, N. Belkhir, J. Angulo, A. Yao and G. Franchi. Pattern Recognition, 108893 (2022). [paper]
Multi-Scale Memory-Based Video Deblurring. B. Ji and A. Yao. CVPR 2022. [paper][code]
Accelerating Video Object Segmentation with Compressed Video. K. Xu and A. Yao. CVPR 2022. [paper][project][code]
TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates. Y. Xie, H. Mao, A. Yao and N. Thuerey. CVPR 2022. [paper][code][project][video]
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities. F. Sener, D. Chatterjee, D. Shelepov, K. He, D. Singhania, R. Wang and A. Yao. CVPR 2022. [paper][project]
Dive Deeper into Integral Pose Regression. K. Gu, L. Yang and A. Yao. ICLR 2022. [paper][video & slides]
Video as a Conditional Graph Hierarchy for Multi-Granular Question Answering (oral). J. Xiao, A. Yao, Z. Liu, Y. Li, W. Ji and T.S. Chu H. Zhang, F. Chen and A. Yao. AAAI 2022. [paper][video & slides]
Comprehensive Regularization in a Bi-Directional Predictive Network for Video Anomaly Detection. C. Chen, Y. Xie, S. Lin, A. Yao, G. Jiang, W. Zhang, Y. Qu, R. Qia, B. Ren and L. Ma. AAAI 2022. [paper][video & slides]
Iterative Contrast-Classify for Semi-Supervised Temporal Action Segmentation. D. Singhania, R. Rahaman and A. Yao. AAAI 2022. [paper][video & slides]

2021

Weakly-Supervised Dense Action Anticipation. H. Zhang, F. Chen and A. Yao. BMVC 2021. [paper][video][code]
Local and Global Point Cloud Reconstruction for 3D Hand Pose Estimation. Z. Yu, L. Yang, S. Chen and A. Yao. BMVC 2021. [paper][supp][video]
Reliable Semantic Segmentation with Superpixel-Mix. G. Franchi, N. Belkhir, M.L. Ha, Y. Hu, A. Bursuc, V. Blanz and A. Yao. BMVC 2021. [paper][supp][video][code]
Removing the bias of integral pose regression. K. Gu, L. Yang and A. Yao. ICCV 2021. [paper][supp][project]
SemiHand: Semi-Supervised Hand Pose Estimation With Consistency. L. Yang, S. Chen and A. Yao. ICCV 2021. [paper][supp]
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions. J Xiao, X Shang, A Yao, TS Chua. CVPR 2021. [paper][supp]
Towards Compact Single Image Super-Resolution via Contrastive Self-distillation. Y Wang, S Lin, Y Qu, H Wu, Z Zhang, Y Xie, A Yao. IJCAI 2021 [paper]

Weakly-Supervised Dense Action Anticipation. H. Zhang, F. Chen and A. Yao. BMVC 2021. [paper][video][code]
Local and Global Point Cloud Reconstruction for 3D Hand Pose Estimation. Z. Yu, L. Yang, S. Chen and A. Yao. BMVC 2021. [paper][supp][video]
Reliable Semantic Segmentation with Superpixel-Mix. G. Franchi, N. Belkhir, M.L. Ha, Y. Hu, A. Bursuc, V. Blanz and A. Yao. BMVC 2021. [paper][supp][video][code]
Removing the bias of integral pose regression. K. Gu, L. Yang and A. Yao. ICCV 2021. [paper][supp][project]
SemiHand: Semi-Supervised Hand Pose Estimation With Consistency. L. Yang, S. Chen and A. Yao. ICCV 2021. [paper][supp]
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions. J Xiao, X Shang, A Yao, TS Chua. CVPR 2021. [paper][supp]
Towards Compact Single Image Super-Resolution via Contrastive Self-distillation. Y Wang, S Lin, Y Qu, H Wu, Z Zhang, Y Xie, A Yao. IJCAI 2021 [paper]

2020

Multi-stage fusion for one-click segmentation. S. Majumder, A. Khurana, A. Rai and A. Yao. GCPR 2020. [paper]
Two-in-One Refinement for Interactive Segmentation. S. Majumder, A. Khurana, A. Rai and A. Yao. BMVC 2020. [paper][video]
Neural network compression via learnable wavelet transforms. M. Wolter, S. Lin and A. Yao. ICANN 2020. [paper]
Sequence prediction using spectral RNNs. M. Wolter, J. Gall and A. Yao. ICANN 2020. [paper]
Dual grid net: Hand mesh vertex regression from single depth maps. C. Wan, T. Probst, L. Van Gool and A. Yao. ECCV 2020. [paper]
Temporal aggregate representations for long range video understanding. F. Sener, D. Singhania and A. Yao. ECCV 2020. [paper]
Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. A. Armagan et al. ECCV 2020. [paper]
Object-centered Fourier motion estimation and segment-transformation prediction. M. Wolter, A. Yao and S. Behnke. ESANN 2020. [paper][code]
Deep morphological networks. G. Franchi, A. Fehri, and A. Yao. Pattern Recognition Letters, 102:107246, 2020 [paper]

Multi-stage fusion for one-click segmentation. S. Majumder, A. Khurana, A. Rai and A. Yao. GCPR 2020. [paper]
Two-in-One Refinement for Interactive Segmentation. S. Majumder, A. Khurana, A. Rai and A. Yao. BMVC 2020. [paper][video]
Neural network compression via learnable wavelet transforms. M. Wolter, S. Lin and A. Yao. ICANN 2020. [paper]
Sequence prediction using spectral RNNs. M. Wolter, J. Gall and A. Yao. ICANN 2020. [paper]
Dual grid net: Hand mesh vertex regression from single depth maps. C. Wan, T. Probst, L. Van Gool and A. Yao. ECCV 2020. [paper]
Temporal aggregate representations for long range video understanding. F. Sener, D. Singhania and A. Yao. ECCV 2020. [paper]
Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. A. Armagan et al. ECCV 2020. [paper]
Object-centered Fourier motion estimation and segment-transformation prediction. M. Wolter, A. Yao and S. Behnke. ESANN 2020. [paper][code]
Deep morphological networks. G. Franchi, A. Fehri, and A. Yao. Pattern Recognition Letters, 102:107246, 2020 [paper]

2019

A two-streamed network for estimating fine-scaled depth maps from single RGB images. J. Li, C. Yuce, R. Klein and A. Yao. Computer Vision and Image Understanding (CVIU), 186:25-36, 2019. [paper]
Zero-shot anticipation for instructional activities. F. Sener and A. Yao. ICCV 2019. [paper][supp][dataset]
Aligning latent spaces for 3D hand pose estimation. L. Yang*, S. Li*, D. Lee, and A. Yao. ICCV 2019. [paper] [supp][poster][r esults]
Localized interactive instance segmentation. S. Majumder and A. Yao. GCPR 2019. [paper]
Scale-aware multi-level guidance for interactive instance segmentation. S. Majumder and A. Yao. CVPR 2019. [paper]
Self-supervised 3D hand pose estimation through training by fitting. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2019 (oral). [paper]
Disentangling latent hands for image synthesis and pose estimation. L. Yang and A. Yao. CVPR 2019. [paper][supp][poster][r esults]

A two-streamed network for estimating fine-scaled depth maps from single RGB images. J. Li, C. Yuce, R. Klein and A. Yao. Computer Vision and Image Understanding (CVIU), 186:25-36, 2019. [paper]
Zero-shot anticipation for instructional activities. F. Sener and A. Yao. ICCV 2019. [paper][supp][dataset]
Aligning latent spaces for 3D hand pose estimation. L. Yang*, S. Li*, D. Lee, and A. Yao. ICCV 2019. [paper] [supp][poster][r esults]
Localized interactive instance segmentation. S. Majumder and A. Yao. GCPR 2019. [paper]
Scale-aware multi-level guidance for interactive instance segmentation. S. Majumder and A. Yao. CVPR 2019. [paper]
Self-supervised 3D hand pose estimation through training by fitting. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2019 (oral). [paper]
Disentangling latent hands for image synthesis and pose estimation. L. Yang and A. Yao. CVPR 2019. [paper][supp][poster][r esults]

2018

Gated complex recurrent neural networks. M. Wolter and A. Yao. NeurIPS 2018. [paper][code]
Learning style compatibility for furniture. D. Aggarwal, E. Valiyev, F. Sener, and A. Yao. GCPR 2018. [paper][dataset]
Supervised deep Kriging for single-image super-resolution. G. Franchi, A. Yao and A. Kolb. GCPR 2018 (oral). [paper]
Unsupervised discovery and segmentation of complex activities. F. Sener and A. Yao. CVPR 2018 (spotlight). [paper]
Dense 3D regression for hand pose estimation. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2018. [paper][code]

Gated complex recurrent neural networks. M. Wolter and A. Yao. NeurIPS 2018. [paper][code]
Learning style compatibility for furniture. D. Aggarwal, E. Valiyev, F. Sener, and A. Yao. GCPR 2018. [paper][dataset]
Supervised deep Kriging for single-image super-resolution. G. Franchi, A. Yao and A. Kolb. GCPR 2018 (oral). [paper]
Unsupervised discovery and segmentation of complex activities. F. Sener and A. Yao. CVPR 2018 (spotlight). [paper]
Dense 3D regression for hand pose estimation. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2018. [paper][code]

2017

Efficient unsupervised temporal segmentation of motion data. B. Krüger, A. Vögele, T. Willig, A. Yao, R. Klein and A. Weber. IEEE Transactions on Multimedia (TMM), 19(4):797–812, 2017. [paper]
A two-streamed network for estimating fine-scaled depth maps from single RGB images. J. Li, R. Klein and A. Yao. ICCV 2017. [paper]
Data-driven synthesis of hand grasps from 3D object models. S. Majumder, H. Chen and A. Yao. VMV 2017. [paper]
Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2017 (spotlight). [paper][code]

Efficient unsupervised temporal segmentation of motion data. B. Krüger, A. Vögele, T. Willig, A. Yao, R. Klein and A. Weber. IEEE Transactions on Multimedia (TMM), 19(4):797–812, 2017. [paper]
A two-streamed network for estimating fine-scaled depth maps from single RGB images. J. Li, R. Klein and A. Yao. ICCV 2017. [paper]
Data-driven synthesis of hand grasps from 3D object models. S. Majumder, H. Chen and A. Yao. VMV 2017. [paper]
Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. C. Wan, T. Probst, L. Van Gool and A. Yao. CVPR 2017 (spotlight). [paper][code]

2016 and older

Superpixel optimization using higher-order energy. J. Peng, J. Shen, A. Yao and X. Li. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 26(5):917–927, 2016. [paper]
Hand pose estimation from local surface normals. C. Wan, A. Yao and L. Van Gool. ECCV 2016. [paper]
Gesture recognition portfolios for personalization. A. Yao, L. Van Gool and P. Kohli. CVPR 2014. [paper]
Coupled action recognition and pose estimation from multiple views. A. Yao, J. Gall and L. Van Gool. International Journal of Computer Vision (IJCV), 100(1):16–37, 2012. [paper]
Interactive object detection. A. Yao, J. Gall, C. Leistner and L. Van Gool. CVPR 2012. [paper][video]
Hough forests for object detection, tracking, and action recognition. J. Gall, A. Yao, N. Razavi, L. Van Gool and V. Lempitsky. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(11):2188 – 2202, 2011. [paper]
Learning probabilistic non-linear latent variable models for tracking complex activities. A. Yao, J. Gall, L. Van Gool and R. Urtasun. NeurIPS 2011. [paper][supp][video]
Does human action recognition benefit from pose estimation? A. Yao, J. Gall, G. Fanelli and L. Van Gool. BMVC 2011. [paper]
2D Action recognition serves 3D human pose estimation. J. Gall, A. Yao and L. Van Gool. ECCV 2010. [paper][video]
Tracking in broadcast sports. A. Yao, D. Uebersax, J. Gall and L. Van Gool. GCPR 2010. [paper][video]
A Hough transform-based voting framework for action recognition. A. Yao, J. Gall and L. Van Gool. CVPR 2010. [paper][video]
Colour aids late but not early stages of rapid natural scene recognition. A.Y.J. Yao and W. Einhäuser. Journal of Vision (JOV), 8(16):12, 1-13, 2008. [paper]

Superpixel optimization using higher-order energy. J. Peng, J. Shen, A. Yao and X. Li. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 26(5):917–927, 2016. [paper]
Hand pose estimation from local surface normals. C. Wan, A. Yao and L. Van Gool. ECCV 2016. [paper]
Gesture recognition portfolios for personalization. A. Yao, L. Van Gool and P. Kohli. CVPR 2014. [paper]
Coupled action recognition and pose estimation from multiple views. A. Yao, J. Gall and L. Van Gool. International Journal of Computer Vision (IJCV), 100(1):16–37, 2012. [paper]
Interactive object detection. A. Yao, J. Gall, C. Leistner and L. Van Gool. CVPR 2012. [paper][video]
Hough forests for object detection, tracking, and action recognition. J. Gall, A. Yao, N. Razavi, L. Van Gool and V. Lempitsky. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(11):2188 – 2202, 2011. [paper]
Learning probabilistic non-linear latent variable models for tracking complex activities. A. Yao, J. Gall, L. Van Gool and R. Urtasun. NeurIPS 2011. [paper][supp][video]
Does human action recognition benefit from pose estimation? A. Yao, J. Gall, G. Fanelli and L. Van Gool. BMVC 2011. [paper]
2D Action recognition serves 3D human pose estimation. J. Gall, A. Yao and L. Van Gool. ECCV 2010. [paper][video]
Tracking in broadcast sports. A. Yao, D. Uebersax, J. Gall and L. Van Gool. GCPR 2010. [paper][video]
A Hough transform-based voting framework for action recognition. A. Yao, J. Gall and L. Van Gool. CVPR 2010. [paper][video]
Colour aids late but not early stages of rapid natural scene recognition. A.Y.J. Yao and W. Einhäuser. Journal of Vision (JOV), 8(16):12, 1-13, 2008. [paper]