Spring 2023 Previous Projects
This course focuses on core techniques and modern advances for integrating different "modalities" into a shared representation or reasoning system. Specifically, these include text, audio, images/videos and action taking.
- Time & Place: 11:00am - 12:20pm on Tu/Th (Doherty Hall 2210)
- Canvas: Lectures and additional details (coming soon)
- Course questions and discussion (Piazza): https://piazza.com/cmu/spring2023/11777
- Project Report Template (GitHub): https://github.com/cmu-mmml/11-777-template
Piazza and Canvas
All course communication will happen via Piazza and Canvas. All videos will be posted to Canvas for offline viewing though aspects of the class/teaching will be interactive and missing from the zoom sessions. Questions will be taken in person -- not on chat
Assignments Timeline and GradingThe course is primarily project based, but there will be readings throughout the course which are only graded via participation.
Project Timeline and Assignments (70% total): (see links for more details)
|Jan 30||Submit group members (see Piazza)|
|Feb 14||R1||Dataset Proposal and Analysis||(10%)|
|Mar 03||R2||Related Work and Model Proposal||(15%)|
|Mar 31||R3||Baseline Analysis||(15%)|
|May 1||R4||Completed Report||(20%)|
Discussion Participation (20%, with up to 2% bonus):
We will have discussion posts for papers and lectures on Piazza. We ask you to make two thoughtful contributions per week (either asking questions or posting answers) on these posts. 11 weeks x 2 contributions = 22pts, but we will grade out of 20 points allowing for 2pts of extra credit. We also will give credit for asking questions in lectures.
Paper Summaries (10%):
Writing a three sentence summary describing the paper you read earns you 1pt. This summary will be submitted in three text boxes. Specifically, A. The goal of the paper, B. Explain the key insight, C. State a key limitation or important extension. There will be 11 opportunities, so one bonus point can be earned (11%). Paper summaries are due the following Tuesday night (1 week after being assigned).
- All deadlines are midnight EST (determined by Canvas submission)
- Project reports are graded as a group (single PDF submission), while all other grades are individual.
- Late days: Every team has a budget of 6 late days. They will be automatically calculated, after which 2% absolute is removed from max grade.
Tasks & DatasetsThe course will be primarily centered on a few datasets/tasks to facilitate cross-team collaboration and technical assistance. If your team has a good reason to work on something else, please reach out so we can discuss it and put together a proposal.
|Room-Across-Room||Code||Multilingual Embodied Navigation|
|ALFRED||Code||Embodied instruction following with interaction|
Question Answering & Captioning
|TextVQA||Code||Text in images (referring expressions and reading)|
|WebQA||Code||Multihop Visual QA|
|VizWiz||VQA and Captioning||Visual models for blind users|
|Complex reasoning about pairs of images|
|CompGuessWhat?!||Visual Guessing Game and Attribute Prediction|
|Spoken Image Captions||A series of audio corpora and corresponding images for connecting audio directly to image regions.|
|TVQA||Video Question Answering Dataset|
|VATEX||Multilingual Video Captioning and Translation|
Physical hardware / robots / sensors ...
|What about physical hardware? robots? tasks not datasets? Let's talk.|
Compute Limited AWS compute credits will be made available to each student, so please consider both your interests and available compute resources when deciding on a dataset/project.