Spring 2021 Previous Projects
This course focuses on core techniques and modern advances for integrating different "modalities" into a shared representation or reasoning system. Specifically, these include text, audio, images/videos and action taking.
- Time & Place: 10:40am - 12:00pm on Zoom Tu/Th
- Canvas: Lectures and additional details
- Course questions and discussion: Slack
- GitHub Template: https://github.com/ybisk/11-777-template
Slack and Canvas
All course communication will happen via slack and canvas. All videos will be posted to Canvas for offline viewing though aspects of the class/teaching will be interactive in the zoom sessions.
Slack
- #general-questions: For questions about lectures, the course, or help from others on class projects
- #group-N-X: Each group should come up with a name and create their own private channel (invite TAs and instructur). Use the same name for your GitHub fork and pin the link to the channel. Please also invite us to the GitHub. Example: #group-fun-vizwiz
- #dataset-XYZ: Each core dataset will also have its own slack channel that anyone can join (across groups) to ask for help on setup, preprocessing, and other issues that might arise.
- Private Messages: If there is a question you would like to address to the instructors, please create a 4-person PM on slack. Please check #general-questions first and post there when possible.
Assignments Timeline and Grading
The course is primarily project based, but there will be readings throughout the course which are only graded via participation.Project Timeline and Assignments:
Feb 18 | Group Formed and Dataset Chosen | ||
Mar 04 | R1 | Task Definition and Data Analysis | (10%) |
Mar 11 | R2 | Related Work and Background | (10%) |
Mar 18 | R3 | Baselines, Metrics, and Empty Results Table | (10%) |
Apr 01 | R4 | Analysis of Baselines | (10%) |
Apr 22 | R5 | Proposed Approach | (10%) |
May 11 | Presentation | (10%) | |
May 13 | R6 | Completed Report | (20%) |
Participation:
Participation in Class or Slack (20%)
Participation is evaluated as "actively asking/answering questions based on the lectures, readings, and/or assisting other teams with project issues". Concretely, this means that every novel question or helpful answer provided in Slack will count for 1%, up to a total of 20% of your grade.
Submission Policies:
- All deadlines are midnight EST (determined by GitHub commit time)
- Late days: Every team has a budget of 6 late days. They will be automatically calculated based on git commit, after which 2% absolute is removed from max grade.
Datasets
The course will be primarily centered on a few datasets/tasks to facilitate cross-team collaboration and technical assistance. If your team has a good reason to work on something not listed here, please reach out so we can discuss it and put together a proposal.Paper | GitHub | Domain |
---|---|---|
ALFRED | Code Challenge |
Embodied instruction following with interaction |
MEmoR | Code Ask TA for data |
Emotion in conversational context on the big bang theory. |
Natural Language for Visual Reasoning | Code v1 Code v2 |
Visual reasoning about pairs of images and language descriptions. For NLVR2, look at the Contrastive Sets fold. |
Room-Across-Room | Code | Embodied instruction following (with views and multilingual) |
Social-IQ | Code Proj page |
Video Question Answering focused on social interactions |
VizWiz Challenge | Challenge | Image Captioning and Question Answering for the blind and visually impaired |
Lectures
Tuesday | Thursday |
---|---|
Feb 2: Course Structure
|
Feb 4: Multimodal applications and datasets
|
Feb 9: Basics: "Deep learning"
|
Feb 11: Basics: Optimization
|
Feb 16: Unimodal representations (Vision)
|
Feb 18: Unimodal representations (Language)
|
Feb 23 -- NO CLASS -- |
Feb 25: Project Hours (Reports 1&2) |
Mar 2: Multimodal & Coordinated Representations
|
Mar 4: Alignment and Attention
|
Mar 9: Alignment + Representation
|
Mar 11: Project Hours (Report 3)
|
Mar 16: Project Hours (Report 3) |
Mar 18:Alignment + Representation (Cont)
|
Mar 23:Alignment + Translation
|
Mar 25: Embodiment
|
Mar 30: Reinforcement Learning
|
Apr 1: Multimodal RL
|
Apr 6: Project Hours (Report 5) | Apr 8: Project Hours (Report 5) |
Apr 13: Fusion and co-learning
|
Apr 15: -- NO CLASS -- |
Apr 20: New research directions
|
Apr 22: Affective Computing (Torsten Wörtwein)
|
Apr 27: Project Hours (Final) |
Apr 29: Project Hours (Final) |
May 4: Guest Lecture (Mark Yatskar - UPenn) Bias and Structure in Vision-and-Language |
May 6: Guest Lecture (Malihe Alikhani - Pitt)
Coherence and Grounding in Multimodal Communication |
May 11: Project Presentations (live) |
May 13: -- NO CLASS --
|