Spring 2023: CS 6384 Computer Vision – Intelligent Robotics and Vision Lab at the University of Texas at Dallas

Understand the 3D world from 2D images

Course Information

Term: Spring 2023
Class Level: Graduate
Activity Type: Lecture
Days & Times: Monday & Wednesday 1:00 PM – 2:15 PM
Location: ECSW 3.210

Instructor: Prof. Yu Xiang
Office Location: ECSS 4.702
Office Hours: Monday & Wednesday 3:30PM – 4:30 PM

Teaching Assistant: Jishnu P
Office Hours: Tuesday 3:00PM – 4:00 PM

Syllabus

All the course materials can be found here.

Course Description

Theory and practice of computer vision. Provides in-depth overview of computer vision, including geometric primitives and transformations, camera models, image features, epipolar geometry and stereo, structure from motion and SLAM, 3D reconstruction, variations of modern neural networks and various recognition problems such as object detection, semantic segmentation, and human pose estimation.

Textbooks

Richard Szeliski. Computer Vision: Algorithms and Applications. 2011th Edition. Springer.
ISBN-13: 978-1848829343
ISBN-10: 1848829345
Second Edition Draft

David Forsyth, Jean Ponce. Computer Vision: A Modern Approach, 2nd Edition. Pearson, 2011. (Optional)
ISBN: 9789332550117

Richard Hartley. Multiple View Geometry in Computer Vision, 2nd Edition. Cambridge University Press, 2004. (Optional)
ISBN-13: 978-0521540513
ISBN-10: 0521540518

Grading Policy

Homework (50%)
- Assignment 1 (10%)
- Assignment 2 (10%)
- Assignment 3 (10%)
- Assignment 4 (10%)
- Assignment 5 (10%)
Team Project (45%)
- Project proposal (5%)
- Project mid-term report (10%)
- Project presentation (15%)
- Project final report (15%)
In-class Activity (5%)

Project

Project proposal description (PDF)
Project mid-term report requirement (PDF)
Project presentation and final report requirement (PDF)

Homework

Assignment 1 (PDF, programming)
Assignment 2 (PDF, programming)
Assignment 3 (PDF, programming)
Assignment 4 (PDF, programming)
Assignment 5 (PDF, programming)
Learning from the “Cracker Box”

Guest Lecturer

Dr. Saining Xie from NYU will talk about representation learning and visual recognition on 4/26/2023.
Title: Scalable Visual Pre-training: Past, Present, and Paths Forward
Abstract
The remarkable visual recognition abilities of humans are rooted in a strong biological and cognitive foundation. In the past decade, deep learning has made significant strides across numerous domains, with representation learning playing a key role in this progress. This area focuses on learning efficient, accurate, and robust representations from raw data that can be utilized by a downstream classifier or predictor. Contemporary deep learning systems consist of two intertwined core components: 1) neural network architectures and 2) representation learning algorithms. In this lecture, we will explore several studies in both areas. We will delve into modern network design principles and their impact on the scaling behavior of ConvNets and Vision Transformers. We will also examine our endeavors to transcend the conventional supervised learning paradigm, showcasing how self-supervised visual representation learning can surpass supervised learning in various visual recognition tasks. Our discussion will cover a range of vision application domains and modalities (e.g., 2D images, 3D scenes and vision+language), illustrating the connections between techniques tailored for different input modalities and shedding light on the distinct challenges each modality presents. Lastly, we will address several critical challenges and opportunities that the “Large Pre-trained Model” era presents for computer vision research.

Lectures

Edit

Date	Topic
Week 1, 1/16	Martin Luther King Day
Week 1, 1/18, Lecture 1	Introduction to Computer Vision (slides)
Week 2, 1/23, Lecture 2	Image Formulation: Geometric Primitives and Transformations (slides)
Week 2, 1/25, Lecture 3	Image Formulation: 3D Rotations (slides)
Week 3, 1/30, Lecture 4	Image Formulation: Camera Models (slides)
Week 3, 2/1	Cancelled due to weather conditions
Week 4, 2/6, Lecture 5	Image Formulation: Visual Rendering: Vertex Transforms (slides)
Week 4, 2/8, Lecture 6	Image Formulation: Visual Rendering: Rasterization, Lighting and Shading, Fragment Processing (slides)
Week 5, 2/13, Lecture 7	Feature Detection and Matching: Keypoint Features: Image Convolution and Harris Corner Detector (slides)
Week 5, 2/15, Lecture 8	Feature Detection and Matching: Keypoint Features: Scale Invariance and SIFT (slides)
Week 6, 2/20, Lecture 9	Feature Detection and Matching: Edges, Contours and Lines (slides)
Week 6, 2/22, Lecture 10	3D Vision: Camera Calibration and Pose Estimation (slides)
Week 7, 2/27, Lecture 11	3D Vision: Epipolar Geometry and Stereo (slides)
Week 7, 3/1, Lecture 12	3D Vision: Structure from Motion and SLAM (slides)
Week 8, 3/6, Lecture 13	3D Vision: 3D Reconstruction (slides)
Week 8, 3/8, Lecture 14	Deep Learning: Convolutional Neural Networks I (slides)
Week 9, 3/13	Spring Break
Week 9, 3/15	Spring Break
Week 10, 3/20, Lecture 15	Deep Learning: Convolutional Neural Networks II (slides)
Week 10, 3/22, Lecture 16	Deep Learning: Recurrent Neural Networks (slides)
Week 11, 3/27, Lecture 17	Deep Learning: Transformers (slides)
Week 11, 3/29, Lecture 18	Deep Learning: Generative Neural Networks (slides)
Week 12, 4/3, Lecture 19	Deep Learning: Neural Networks for 3D Data (slides)
Week 12, 4/5, Lecture 20	Recognition: Visual Representation Learning (slides)
Week 13, 4/10, Lecture 21	Recognition: Optical Flow and Correspondences (slides)
Week 13, 4/12, Lecture 22	Recognition: Object Detection (slides)
Week 14, 4/17, Lecture 23	Recognition: Semantic Segmentation (slides)
Week, 14, 4/19, Lecture 24	Recognition: Pose Estimation of Objects, Humans and Hands (slides)
Week 15, 4/24, Lecture 25	Recognition: Images and Languages (slides)
Week 15, 4/26	Guest Lecture: Dr. Saining Xie Scalable Visual Pre-training: Past, Present, and Paths Forward (slides)
Week 16, 5/1	Project Presentation I Group 2: Lane and Obstacle Detection for Autonomous Vehicles (slides, demo) Group 3: Human Pose Estimation based Posture Corrector (slides, demo) Group 6: Image Search Engine (slides) Group 7: Hand Gesture Recognition for Interaction with Computers (slides) Group 8: Verification of Identity using Triplet Network (slides) Group 9: Memento: Object Detection and Tracking for Memory Recall (slides, demo) Group 10: Vehicle Detection, Classification and Counting (slides) Group 11: Human Movement Analysis for Sports Performance Evaluation (slides, demo)
Week 16, 5/3	Project Presentation II Group 5: Mask Detection and Social Distance Evaluation (slides) Group 12: Create Video Clips using Frame Interpolation (slides) Group 13: Itemization of Receipts Using Computer Vision Techniques (slides) Group 14: Real time Alertness assessment using CNN and Viola-Jones Algorithm (slides) Group 15: Parking Spot Detection (slides, demo) Group 16: YOGA Master (slides, demo1, demo2) Group 17: ZEN: A Cross architecture Generalizable Dataset Distillation Approach (slides) Group 18: naviVision (slides, demo) Group 19: Learning based 3D Representation and Rendering (slides, demo)