Fall 2024 Previous Projects
This course focuses on core techniques and modern advances for integrating different "modalities" into a shared representation or reasoning system. Specifically, these include text, audio, images/videos and action taking.
- Time & Place: 9:30am - 11:00am on Tu/Th (Margaret Morrison A14)
- Canvas: Lectures and additional details (coming soon)
- Course questions and discussion (Piazza): https://piazza.com/cmu/fall2024/11777
- Project Report Template (GitHub): https://github.com/cmu-mmml/11-777-template
Piazza and Canvas
All course communication will happen via Piazza and Canvas. All videos will be posted to Canvas for offline viewing though aspects of the class/teaching will be interactive and missing from the zoom sessions. Questions will be taken in person -- not on chat
Piazza: https://piazza.com/cmu/fall2024/11777
Assignments Timeline and Grading
The course is primarily project based, but there will be readings throughout the course and participation points.Project Timeline and Assignments (70% total): (see links for more details)
Sep 13 | Submit group members and name | ||
Sep 19 | R1 | Dataset Proposal and Analysis | (10%) |
Oct 8 / 10 | Midterm Presentation | (10%) | |
Oct 10 | R2 | Baselines and Model Proposal | (10%) |
Nov 7 | R3 | Analysis of Baselines | (10%) |
Dec 3 / 5 | Final Presentation | (10%) | |
Dec 11 | R4 | Completed Report | (20%) |
Discussion Participation (20%, with up to 2% bonus):
We will have discussion posts for class sessions on canvas. There are two types of sessions that we will ask for discussion in:
- Lectures. We will have a canvas discussion thread where you should ask a question, or give an answer. Questions should be thoughtful, and have citations (e.g. to papers, points in the lecture slides, or other resources). Please try to ask questions that you and others might actually want the answers to! We also will give credit for asking questions in lectures, but you must post a note in the discussion thread documenting your participation to canvas for TAs to accumulate after class.
- Project presentations. We will pair you with a group, and ask you to give peer feedback (via email or canvas, TBD) on their project – making suggestions on the work they are doing, as given in the presentation.
Paper Readings (10%):
These readings start generic and become specific. You are building the related work section of your project. We will begin by offering readings to choose from and you will eventually be finding your own readings. You will submit annotations on the paper with your questions and notes, tied to particular parts of the paper. You can either do this by uploading a pdf with your notes (handwritten, or using a pdf comment function); or if necessary by uploading a text file. If you use a text file, you *must* have notes that refer to specific parts of the paper, e.g. Equation 2, Section 2.3, Conclusion. Paper summaries are due the following Monday night (1 week after being assigned).
Submission Policies:
- All deadlines are midnight EST (determined by Canvas submission)
- Project reports are graded as a group (single PDF submission), while all other grades are individual.
- Late days: Every team has a budget of 6 late days. They will be automatically calculated, after which 2% absolute is removed from max grade. You can use up to 3 of these late days on the Final Report. (updated 11/1)
Tasks & Datasets
The course is primarily centered on a project. We will have a list of seed tasks and datasets below (coming soon!).Piyush | Visual question answering, uncertainty estimation and calibration in multimodal models, multimodal reasoning using code generation. |
Li-Wei |
Speech processing and generation
|
Madhura |
Vision-language models, image/text modalities
|
Haoyang |
Embodiment, Language+Vision
|
Lawrence |
OS/Web Agents
|
Siddhant | Visual Common Sense Reasoning, Multimodal Question Answering |
Physical hardware / robots / sensors ...
What about physical hardware? robots? tasks not datasets? Let's talk. |
Compute Limited AWS compute credits will be made available to each student, so please consider both your interests and available compute resources when deciding on a dataset/project.
Lectures
(tentative)Tuesday | Thursday |
---|---|
Aug 27: Course Structure
|
Aug 29: Multimodal applications and datasets
|
Overview Readings:
|
|
Sep 3: Unimodal representations (Language) |
Sep 5: Unimodal representations (Vision + Audio) |
Readings: | |
Sep 10: Fusion |
Sep 12: Fission |
Readings: | |
Sep 17: Alignment + Grounding |
Sep 19: Aligned Representations
|
Sep 24: Project Hours | Sep 26: Project Hours |
Oct 1: Transformers Part 1 |
Oct 3: Transformers Part 2 |
Oct 8: Midterm Presentations |
Oct 10: Midterm Presentations
|
Oct 15: Fall Break! |
Oct 17: Fall Break! |
Oct 22: Generation |
Oct 24: Generation (cont.) |
Oct 29: Embodiment |
Oct 31: Embodiment (cont) |
Nov 5: Election Day (university holiday) |
Nov 7: Agents Guest Lecture: JY Koh
|
Nov 12: Reasoning | Nov 14: Guest Lecture: So Yeon Tiffany Min |
Nov 19: Guest Lecture: Lili Yu | Nov 21: Quantification + Bias |
Nov 26: New research directions
|
Nov 28: Thanksgiving (no class) |
Dec 3: Final Presentations | Dec 5: Final Presentations |