CS 6501-003: Deep Learning for Visual Recognition

Instructor: Vicente Ordóñez-Román (vicente at virginia.edu). Office Hours: Tuesdays 3 to 5pm (Rice 310)
Teaching Assistant: Ziyan Yang (zy3cx at virginia.edu) -- Hours: Thursdays 3pm to 5pm (Rice 442)
Teaching Assistant: Paola Cascante-Bonilla (pc9za at virginia.edu) -- Office Hours: Fridays 2 to 4pm (Rice 442)
Class Time: Mondays & Wednesdays between 3:30PM and 4:45PM, at Olsson Hall 005.
Discussion Forum: Piazza

Course Description: How can we use computers to recognize objects, people, actions, animals, places, etc from images? This seemingly trivial task that people perform without much effort has remained one of the core problems in Computer Vision. Recent advances in representation learning using multiple layers of abstraction (deep learning) have demonstrated to be an important aspect for designing artificial systems for visual recognition. In this class we will study, conceive, and implement deep learning models and learning algorithms for computational visual recognition. After this class you will be able to understand, design, implement, and assess the impact of deep learning techniques for a diverse range of visual recognition tasks.

Learning Objectives: (a) Develop intuitions between aspects in human vision and computer vision, (b) Understanding foundational concepts for representation learning using neural networks, (c) Become familiar with state-of-the-art models for tasks such as image classification, object detection, image segmentation, scene recognition, etc, and (d) Obtain practical experience in the implementation of visual recognition models using deep learning.

Prerrequisites: This course requires no previous background in computer vision or machine learning but knowledge in either of those will be helpful. You need to know about matrices, calculating derivatives, and probabilities (bayes rule). You will also need to be at least a moderately proficient programmer in python. There will be several lab assignments. These assignments will show you the basics of modern general visual recognition algorithms and models, and will give you the tools for implementing more advanced ones. Finally, we will have a class project where you will be able to work on something beyond your assignments and where you will have more freedom to pursue a focused problem that is of your interest and better matches your background. Finally we will be using python/pytorch in the lecture notes, so being proficient in Python by completing a few projects in this language before the class starts is helpful. You should install python, jupyter, and pytorch, and complete the following notebook [pytorch_tensors].

Syllabus

Date     Topic 
Mon, January 13th Introduction to Visual Recognition [pptx] [pdf] + Primer on Image Processing [link]
Assignment on Image Processing and Manipulation [Colab]. Due January 26th 5pm EST.
Wed, January 15th Image Processing and Image Manipulations [pptx] [pdf]
Mon, January 20th MLK Holiday -- no class this day  
Assignment on Image Classification [Colab]. Due February 3rd 11:59pm EST.
Wed, January 22nd Softmax Classifier + Stochastic Gradient Descent [pptx] [pdf]
Mon, January 27th Shallow Image Features and the Bag of Features model [pptx] [pdf]  
Assignment on Deep Learning Basics [Colab]. Due February 10th 11:59pm EST.
Wed, January 29th Neural Networks and the Multi-layer Perceptron Model [pptx] [pdf]  
Mon, February 3rd Convolutional Neural Networks (CNNs) [pptx] [pdf]  
Assignment on Convolutional Neural Networks [Colab]. Due February 24th 11:59pm EST.
Wed, February 5th
Speaker: Dr. Catherine Schuman (Oak Ridge National Laboratory)
Guest Lecture: Neuromorphic Computing
More information: Dr. Catherine Schuman works as Research Scientist at the Oak Ridge National Lab (ORNL) in Tennessee in Neuromorphic computing and Spiking Neural Networks. These are models that function in some ways more similarly to processes in the brain and seem to be promising in terms of efficiency.
Mon, February 10th Convolutional Neural Network Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet [pptx] [pdf]  
Wed, February 12th Deep Learning-based Object Detection [pptx] [pdf]
Mon, February 17th Deep Learning-based Semantic Image Segmentation [pptx] [pdf]
Wed, February 19th Generative Adversarial Networks (GANs)
Mon, February 24th Paper Review: CNNs as Features for Transfer Learning
  • CNN Features off-the-shelf: an Astounding Baseline for Recognition. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson. CVPR 2014 Workshops. [arxiv] (Presented by Ziyan Yang)
  • Do Better ImageNet Models Transfer Better? Simon Kornblith, Jonathon Shlens, Quoc V. Le CVPR 2019 [arxiv] (Presented by Paola Cascante-Bonilla)
Wed, February 26th Recurrent Neural Networks (RNNs)
Mon, March 2nd Paper Review: Face Recognition and Pose Estimation
  • Deep Face Recognition. Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. BMVC 2015. [pdf]
  • Deep High-Resolution Representation Learning for Human Pose Estimation. Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang. CVPR 2019 [arxiv].
Wed, March 4th Paper Review: Recent Methods for Object Detection and Instance Segmentation.
  • Mask R-CNN. by Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick. ICCV 2017 [arxiv].
  • CornerNet: Detecting Objects as Paired Keypoints. by Hei Law, Jia Deng . ECCV 2018. [arxiv]
Mon, March 9th Spring recess -- no class this day  
Wed, March 11th Spring recess -- no class this day  
Mon, March 16th Paper Review: Interpreting and Explaining Deep Neural Networks
  • Network Dissection: Quantifying Interpretability of Deep Visual Representations. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017 [arxiv].
  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. ICCV 2017. [arxiv]
 
Wed, March 18th Paper Review: Image to Text: Image Captioning
  • Show and Tell: A Neural Image Caption Generator. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. CVPR 2015 [arxiv].
  • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang. CVPR 2018. [arxiv]
Mon, March 23rd Paper Review: Structured Prediction with Partial Labels
  • Learning Structured Inference Neural Networks with Label Relations. Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, Greg Mori CVPR 2016 [arxiv].
  • Feedback-prop: Convolutional Neural Network Inference under Partial Evidence Tianlu Wang, Kota Yamaguchi, Vicente Ordonez. CVPR 2018. [arxiv]
Wed, March 25th Paper Review: Efficient CNN Architectures
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. [arxiv]
  • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun. ECCV 2018 [arxiv].
Mon, March 30th Paper Review: Conditional Generative Adversarial Networks (GANs)
  • Image-to-Image Translation with Conditional Adversarial Networks. By Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. CVPR 2017 [arxiv].
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. ICCV 2017. [arxiv]
 
Wed, April 1st Paper Review: Avoiding Visual Bias in Computer Vision
  • Women also Snowboard: Overcoming Bias in Captioning Models. By Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach. ECCV 2018 [arxiv].
  • Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations. Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez ICCV 2019. [arxiv]
 
Mon, April 6th Paper Review: Video Recognition
  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Joao Carreira, Andrew Zisserman. CVPR 2017. [arxiv]
  • SlowFast Networks for Video Recognition. Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He. ICCV 2019. [link]
 
Wed, April 8th Paper Review: Transformer Networks
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee NeurIPS 2019 [arxiv].
  • VisualBERT: A Simple and Performant Baseline for Vision and Language. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang . [arxiv]
 
Mon, April 13th Paper Review: Self-supervised Learning
  • Self-Supervised Learning of Pretext-Invariant Representations. Ishan Misra, Laurens van der Maaten . [arxiv]
  • Momentum Contrast for Unsupervised Visual Representation Learning. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick. [arxiv]
 
Wed, April 15th Paper Review: Colorization and Super-resolution
  • ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. ECCV 2018 Workshops [arxiv].
  • Learning Diverse Image Colorization. Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, Min Jin Chong, David Forsyth. CVPR 2017. [arxiv]
 
Mon, April 20th Paper Review: Neural Architecture Design and Search
  • Exploring Randomly Wired Neural Networks for Image Recognition. Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. [arxiv]
  • DARTS: Differentiable Architecture Search. Hanxiao Liu, Karen Simonyan, Yiming Yang. ICLR 2019. [arxiv]
 
Wed, April 22nd Course Recap and What's Next?  
Mon, April 27th Final Project Poster Presentation  

Disclaimer: The professor reserves to right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: Assignments: 400pts (4 assignments: 100pts + 100pts + 100pts + 100pts), Class Project: 400pts, Reading Summaries: 100pts, Class Presentation: 100pts. Letter grades to be decided as follows: A+ (1000pts), A (930pts), A- (900pts), B+ (870pts), B (830pts), B- (800pts), C+ (770pts), C (730pts), C- (700pts), D+ (670pts), D (630pts), D- (600pts).

Late Submission Policy: No late assignments will be accepted in this class. Unless the student has procured special accommodations for this class.

Academic Integrity Statement: "The School of Engineering and Applied Science relies upon and cherishes its community of trust. We firmly endorse, uphold, and embrace the University’s Honor principle that students will not lie, cheat, or steal, nor shall they tolerate those who do. We recognize that even one honor infraction can destroy an exemplary reputation that has taken years to build. Acting in a manner consistent with the principles of honor will benefit every member of the community both while enrolled in the Engineering School and in the future. Students are expected to be familiar with the university honor code, including the section on academic fraud."

Accessibility Statement: "The University of Virginia strives to provide accessibility to all students. If you require an accommodation to fully access this course, please contact the Student Disability Access Center (SDAC) at (434) 243-5180 or sdac@virginia.edu. If you are unsure if you require an accommodation, or to learn more about their services, you may contact the SDAC at the number above or by visiting their website at https://www.studenthealth.virginia.edu/student-disability-access-center/about-sdac."

Other similar courses or courses with useful related material:

Department of Computer Science, University of Virginia, Spring 2020.