Spring 2023 Previous Projects
This course focuses on core techniques and modern advances for integrating different "modalities" into a shared representation or reasoning system. Specifically, these include text, audio, images/videos and action taking.
- Time & Place: 11:00am - 12:20pm on Tu/Th (Doherty Hall 2210)
- Canvas: Lectures and additional details (coming soon)
- Course questions and discussion (Piazza): https://piazza.com/cmu/spring2023/11777
- Project Report Template (GitHub): https://github.com/cmu-mmml/11-777-template
Piazza and Canvas
All course communication will happen via Piazza and Canvas. All videos will be posted to Canvas for offline viewing though aspects of the class/teaching will be interactive and missing from the zoom sessions. Questions will be taken in person -- not on chat
Piazza: https://piazza.com/cmu/spring2023/11777
Assignments Timeline and Grading
The course is primarily project based, but there will be readings throughout the course which are only graded via participation.Project Timeline and Assignments (70% total): (see links for more details)
Jan 30 | Submit group members (see Piazza) | ||
Feb 14 | R1 | Dataset Proposal and Analysis | (10%) |
Mar 03 | R2 | Related Work and Model Proposal | (15%) |
Mar 31 | R3 | Baseline Analysis | (15%) |
Apr 25/27 | Presentation | (10%) | |
May 1 | R4 | Completed Report | (20%) |
Discussion Participation (20%, with up to 2% bonus):
We will have discussion posts for papers and lectures on Piazza. We ask you to make two thoughtful contributions per week (either asking questions or posting answers) on these posts. 11 weeks x 2 contributions = 22pts, but we will grade out of 20 points allowing for 2pts of extra credit. We also will give credit for asking questions in lectures.
Paper Summaries (10%):
Writing a three sentence summary describing the paper you read earns you 1pt. This summary will be submitted in three text boxes. Specifically, A. The goal of the paper, B. Explain the key insight, C. State a key limitation or important extension. There will be 11 opportunities, so one bonus point can be earned (11%). Paper summaries are due the following Tuesday night (1 week after being assigned).
Submission Policies:
- All deadlines are midnight EST (determined by Canvas submission)
- Project reports are graded as a group (single PDF submission), while all other grades are individual.
- Late days: Every team has a budget of 6 late days. They will be automatically calculated, after which 2% absolute is removed from max grade.
Tasks & Datasets
The course will be primarily centered on a few datasets/tasks to facilitate cross-team collaboration and technical assistance. If your team has a good reason to work on something else, please reach out so we can discuss it and put together a proposal.Simulator Based
Room-Across-Room | Code | Multilingual Embodied Navigation |
ALFRED | Code | Embodied instruction following with interaction |
Question Answering & Captioning
TextVQA | Code | Text in images (referring expressions and reading) |
WebQA | Code | Multihop Visual QA |
VizWiz | VQA and Captioning | Visual models for blind users |
NLVR2 | Code Proj page |
Complex reasoning about pairs of images |
Multi-turn QA
CompGuessWhat?! | Visual Guessing Game and Attribute Prediction |
Audio
Spoken Image Captions | A series of audio corpora and corresponding images for connecting audio directly to image regions. |
Video
TVQA | Video Question Answering Dataset | |
VATEX | Multilingual Video Captioning and Translation |
Physical hardware / robots / sensors ...
What about physical hardware? robots? tasks not datasets? Let's talk. |
Compute Limited AWS compute credits will be made available to each student, so please consider both your interests and available compute resources when deciding on a dataset/project.
Lectures
(tentative)Tuesday | Thursday |
---|---|
Jan 17: Course Structure
|
Jan 19: Multimodal applications and datasets
|
Overview Readings:
|
|
Jan 24: Unimodal representations (Vision)
|
Jan 26: Unimodal representations (Language)
|
Readings: | |
Jan 31: Representation |
Feb 02: Representation |
Readings: | |
Feb 07: Alignment + Grounding |
Feb 09: Aligned and Attended |
Readings: | |
Feb 14: Multimodal Transformers
|
Feb 16: Multimodal Transformers II |
Feb 21: Embodiment |
Feb 23: RL, Logic, and Causality |
| |
Feb 28: Project Hours |
Mar 02: Project Hours
|
Mar 07: Spring Break! |
Mar 09: Spring Break! |
Mar 14: Embodiment (cont) |
Mar 16: Quantification and Bias |
Mar 21: Generation + Translation | Mar 23: Generation |
Mar 28: Transference |
Mar 30: Guest Lecture by Maarten Sap on Social Bias
|
| |
Apr 04: Project Hours |
Apr 06: Project Hours |
Apr 11: Guest Lecture by Aida Nematzadeh | Apr 13: Carnival (no class) |
Main papers: Additional papers: | |
Apr 18: New research directions
|
Apr 20: Guest Lecture by Alane Suhr |
|
|
Apr 25: Final Presentations | Apr 27: Final Presentations |