Imagine a world where computer vision-based systems can analyze a video of an athlete, a surgeon, a patient, or a factory worker and instantly provide expert-level actionable feedback---correcting techniques, identifying inefficiencies, and helping people refine their skills in real time. Thanks to rapid progress in video understanding, this vision is becoming reality. AI-powered systems can now analyze complex human activities, assess performance, and generate intelligent feedback, unlocking new possibilities in sports, healthcare, manufacturing, education, rehabilitation, and beyond. Through Expert Keynotes and Invited Contributions, this CVPR 2026 workshop will explore the cutting edge of skilled activity understanding, assessment, and feedback generation, bridging research and real-world applications.
As AI systems become more capable of understanding human expertise, the implications are profound---empowering individuals with personalized coaching, democratized skill development, and scalable training solutions. We invite researchers, industry leaders, and practitioners to join us in shaping the future of AI-powered skill understanding. Whether working on foundational research, applied solutions, or real-world deployment, this workshop is an opportunity and forum to learn about and push the boundaries of how AI perceives, evaluates, and enhances human ability.
Date & Timings: June 04, 2026 | 13:00-18:00 (Afternoon Session)
Venue: Hall 705/707, Denver Convention Center
Past edition: ICCV 2025
Dr. Walterio Mayol-Cuevas (University of Bristol + Amazon)
Dr. Anwesa Choudhuri (United Imaging Intelligence)
Dr. Guodong Ding (National University of Singapore)
We invite contributions related to, but not limited to, the following areas:
We have following two Paper Tracks. Please use this Openreview Link to submit your papers, and select the appropriate track in OpenReview. We recommend you to use Institution emails to register your account on Openreview, as non-institution emails can take upto 2 weeks to register.
Decisions: March 31, 2026 (All times are in Anywhere on Earth (AoE) timezone)
Notifications sent to authors: March 31, 2026
Cameraready deadline: April 9, 2026. Instructions have been emailed to authors.
1) Proceedings Track. Papers accepted to this track will be published in the official proceedings. Please use CVPR 2026 paper templates. Page limit, excluding references, is 8 pages. Please strictly follow CVPR 2026 paper guidelines. Supplementary material is allowed. We will follow single-stage review process--we will not have rebuttal stage. Tentative Submission Deadline is: March 21, 2026. Camera-ready dealine: April 09,2026.
2) Non-Proceedings Track. These papers would be presented at the workshop, but not included in
the proceedings and can also be submitted to future venues. We also invite papers previously accepted to CVPR
2026 or any other
venues. This tracks allows to share your work and receive feedback on your work, making it stronger. Work-in-Progress or Late
breaking work providing provocations for new work and ideas to emerge, with or without an accompanying evaluation,
can also be submitted. Submissions to Non-proceedings Track are not required to be anonymized. Supplementary material is allowed. We will follow single-stage review process--we will not have rebuttal stage. Tentative Submission Deadline is: March 21, 2026. We may consider opening Round 2 for Non-Proceedings Track after Round 1 review. More details to follow. We have received a strong number of submissions during the initial round, so we won’t be holding an official second round of submissions. However, if you are interested in presenting your work, please feel free to reach out to us directly, and we will be happy to discuss potential opportunities.
Please let us know if you face any problems or have any questions.
| Time | Program |
|---|---|
| 13:00 – 13:15 | Opening Remarks |
| 13:15 – 13:45 | Keynote-1: Dr. Walterio Mayol-Cuevas — Topic: Skill Evolution |
| 13:45 – 14:05 | Keynote-2: Dr. Anwesa Choudhuri — Topic: MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding |
| 14:05 – 14:35 | Group 1 Oral Presentations + grouped QA |
| 14:35 – 15:05 | Group 2 Oral Presentations + grouped QA |
| 15:05 – 15:20 | Coffee Break |
| 15:20 – 15:50 | Keynote-3: Dr. Guodong Ding — Topic: From Action Segmentation to Skill Understanding |
| 15:50 – 16:20 | Group 3 Oral Presentations + grouped QA |
| 16:20 – 16:50 | Group 4 Oral Presentations + grouped QA |
| 16:50 – 16:55 | Closing Remarks |
| 16:55 – 18:00 | Poster Session in ExHall A (please don't hang posters beforehand, as boards are shared amongst workshops this year. Poster boards 281-292 (front+back) are ours. You may board of your choice, we are not following any particular ordering.) |
| Group 1 |
| 1. Behavior-based Skill Assessment for Open Surgery from Multi-view and Egocentric Videos |
| 2. SkillSight: Efficient First-Person Skill Assessment with Gaze |
| 3. Can Vision Language Models Judge Action Quality? An Empirical Evaluation |
| 4. Local Brushstroke Quality Assessment via Vision-Language Feedback |
| 5. Toward Fine-Grained Basketball Action Understanding via Broadcast–Tracking Sensor Fusion |
| Group 2 |
| 1. How to Take a Memorable Picture? Empowering Users with Actionable Feedback |
| 2. Sports Form Anomaly Detection Using Skeleton-PatchCore |
| 3. Learning Coach’s Tactics: Skill-aware Readiness Modeling for Interactive Badminton Training |
| 4. Structured Relational Reasoning for Group Activity Assessment |
| Group 3 |
| 1. ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos |
| 2. Combining Boundary Supervision and Segment-Level Regularization for Fine-Grained Action Segmentation |
| 3. Distance-Aware Attention for Surgical Task Classification with Video Transformers |
| 4. EgoIndAssembly: Diagnosing Fine-Grained Temporal Hand Reasoning in Vision–Language Models for Skilled Assembly |
| 5. UDAPose: Unsupervised Domain Adaptation for Low-Light Human Pose Estimation |
| Group 4 |
| 1. A Study of Failure Modes in Two-Stage Human–Object Interaction Detection |
| 2. Efficient and Intrinsically Interpretable Spatiotemporal Transformer with Gated Fusion for Group Activity Recognition |
| 3. DUAL-Pose: Efficient Dual-Branch Graph Networks for Skeleton-Based Yoga Pose Recognition |
| 4. Counterfactual Action Quality Assessment for Explainable Skill Feedback |