Spring 2023: CS 6384 Computer Vision

Understand the 3D world from 2D images

Course Information

Term: Spring 2023
Class Level: Graduate
Activity Type: Lecture
Days & Times: Monday & Wednesday 1:00 PM – 2:15 PM
Location: ECSW 3.210

Instructor: Prof. Yu Xiang
Office Location: ECSS 4.702
Office Hours: Monday & Wednesday 3:30PM – 4:30 PM

Teaching Assistant: Jishnu P
Office Hours: Tuesday 3:00PM – 4:00 PM

Syllabus

All the course materials can be found here.

Course Description

Theory and practice of computer vision. Provides in-depth overview of computer vision, including geometric primitives and transformations, camera models, image features, epipolar geometry and stereo, structure from motion and SLAM, 3D reconstruction, variations of modern neural networks and various recognition problems such as object detection, semantic segmentation, and human pose estimation.

Textbooks

Richard Szeliski. Computer Vision: Algorithms and Applications. 2011th Edition. Springer.
ISBN-13: 978-1848829343
ISBN-10: 1848829345
Second Edition Draft

David Forsyth, Jean Ponce. Computer Vision: A Modern Approach, 2nd Edition. Pearson, 2011. (Optional)
ISBN: 9789332550117

Richard Hartley. Multiple View Geometry in Computer Vision, 2nd Edition. Cambridge University Press, 2004. (Optional)
ISBN-13: 978-0521540513
ISBN-10: 0521540518

Grading Policy

  • Homework (50%)
    • Assignment 1 (10%)
    • Assignment 2 (10%)
    • Assignment 3 (10%)
    • Assignment 4 (10%)
    • Assignment 5 (10%)
  • Team Project (45%)
    • Project proposal (5%)
    • Project mid-term report (10%)
    • Project presentation (15%)
    • Project final report (15%)
  • In-class Activity (5%)

Project

  • Project proposal description (PDF)
  • Project mid-term report requirement (PDF)
  • Project presentation and final report requirement (PDF)

Homework

Guest Lecturer

Dr. Saining Xie from NYU will talk about representation learning and visual recognition on 4/26/2023.
Title: Scalable Visual Pre-training: Past, Present, and Paths Forward
Abstract
The remarkable visual recognition abilities of humans are rooted in a strong biological and cognitive foundation. In the past decade, deep learning has made significant strides across numerous domains, with representation learning playing a key role in this progress. This area focuses on learning efficient, accurate, and robust representations from raw data that can be utilized by a downstream classifier or predictor. Contemporary deep learning systems consist of two intertwined core components: 1) neural network architectures and 2) representation learning algorithms. In this lecture, we will explore several studies in both areas. We will delve into modern network design principles and their impact on the scaling behavior of ConvNets and Vision Transformers. We will also examine our endeavors to transcend the conventional supervised learning paradigm, showcasing how self-supervised visual representation learning can surpass supervised learning in various visual recognition tasks. Our discussion will cover a range of vision application domains and modalities (e.g., 2D images, 3D scenes and vision+language), illustrating the connections between techniques tailored for different input modalities and shedding light on the distinct challenges each modality presents. Lastly, we will address several critical challenges and opportunities that the “Large Pre-trained Model” era presents for computer vision research.

Lectures

DateTopic
Week 1, 1/16Martin Luther King Day
Week 1, 1/18, Lecture 1Introduction to Computer Vision (slides)
Week 2, 1/23, Lecture 2Image Formulation:
Geometric Primitives and Transformations (slides)
Week 2, 1/25, Lecture 3Image Formulation:
3D Rotations (slides)
Week 3, 1/30, Lecture 4Image Formulation:
Camera Models (slides)
Week 3, 2/1Cancelled due to weather conditions
Week 4, 2/6, Lecture 5Image Formulation:
Visual Rendering: Vertex Transforms (slides)
Week 4, 2/8, Lecture 6Image Formulation:
Visual Rendering: Rasterization, Lighting and Shading, Fragment Processing (slides)
Week 5, 2/13, Lecture 7Feature Detection and Matching:
Keypoint Features: Image Convolution and Harris Corner Detector (slides)
Week 5, 2/15, Lecture 8Feature Detection and Matching:
Keypoint Features: Scale Invariance and SIFT (slides)
Week 6, 2/20, Lecture 9Feature Detection and Matching:
Edges, Contours and Lines (slides)
Week 6, 2/22, Lecture 103D Vision:
Camera Calibration and Pose Estimation (slides)
Week 7, 2/27, Lecture 113D Vision:
Epipolar Geometry and Stereo (slides)
Week 7, 3/1, Lecture 123D Vision:
Structure from Motion and SLAM (slides)
Week 8, 3/6, Lecture 133D Vision:
3D Reconstruction (slides)
Week 8, 3/8, Lecture 14Deep Learning:
Convolutional Neural Networks I (slides)
Week 9, 3/13Spring Break
Week 9, 3/15Spring Break
Week 10, 3/20, Lecture 15Deep Learning:
Convolutional Neural Networks II (slides)
Week 10, 3/22, Lecture 16Deep Learning:
Recurrent Neural Networks (slides)
Week 11, 3/27, Lecture 17Deep Learning:
Transformers (slides)
Week 11, 3/29, Lecture 18Deep Learning:
Generative Neural Networks (slides)
Week 12, 4/3, Lecture 19Deep Learning:
Neural Networks for 3D Data (slides)
Week 12, 4/5, Lecture 20Recognition:
Visual Representation Learning (slides)
Week 13, 4/10, Lecture 21Recognition:
Optical Flow and Correspondences (slides)
Week 13, 4/12, Lecture 22Recognition:
Object Detection (slides)
Week 14, 4/17, Lecture 23Recognition:
Semantic Segmentation (slides)
Week, 14, 4/19, Lecture 24Recognition:
Pose Estimation of Objects, Humans and Hands (slides)
Week 15, 4/24, Lecture 25Recognition:
Images and Languages (slides)
Week 15, 4/26Guest Lecture: Dr. Saining Xie
Scalable Visual Pre-training: Past, Present, and Paths Forward (slides)
Week 16, 5/1Project Presentation I
Group 2: Lane and Obstacle Detection for Autonomous Vehicles (slides, demo)
Group 3: Human Pose Estimation based Posture Corrector (slides, demo)
Group 6: Image Search Engine (slides)
Group 7: Hand Gesture Recognition for Interaction with Computers (slides)
Group 8: Verification of Identity using Triplet Network (slides)
Group 9: Memento: Object Detection and Tracking for Memory Recall (slides, demo)
Group 10: Vehicle Detection, Classification and Counting (slides)
Group 11: Human Movement Analysis for Sports Performance Evaluation (slides, demo)
Week 16, 5/3Project Presentation II
Group 5: Mask Detection and Social Distance Evaluation (slides)
Group 12: Create Video Clips using Frame Interpolation (slides)
Group 13: Itemization of Receipts Using Computer Vision Techniques (slides)
Group 14: Real time Alertness assessment using CNN and Viola-Jones Algorithm (slides)
Group 15: Parking Spot Detection (slides, demo)
Group 16: YOGA Master (slides, demo1, demo2)
Group 17: ZEN: A Cross architecture Generalizable Dataset Distillation Approach (slides)
Group 18: naviVision (slides, demo)
Group 19: Learning based 3D Representation and Rendering (slides, demo)

Edit