Coursera

End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Grow your skills with Coursera Plus for $239/year (usually $399). Save now.

Coursera

End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Fine-tune transformer-based multimodal models using transfer learning in PyTorch and TensorFlow.

  • Build cross-modal retrieval systems using FAISS and attention-based fusion of visual and text embeddings.

  • Automate ML pipelines with drift monitoring, hyperparameter tuning, and retraining using MLflow and Ray Tune.

  • Design and document versioned multimodal inference APIs with FastAPI, OAuth2, and OpenAPI specifications.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

March 2026

Assessments

38 assignments¹

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your Algorithms expertise

This course is part of the Multimodal Intelligence - Vision, Audio & Language in Action Professional Certificate
When you enroll in this course, you'll also be enrolled in this Professional Certificate.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from Coursera

There are 20 modules in this course

You will build the foundational MLOps infrastructure for multimodal AI systems by designing modular data pipeline components and implementing your first multimodal transformer fine-tuning workflow using open source tools.

What's included

3 videos1 reading1 assignment1 ungraded lab

You will accelerate multimodal model development using transfer learning techniques and implement the transformation and loading pipeline stages that deliver processed data and trained models reliably to downstream systems.

What's included

1 video1 reading3 assignments

You will identify and analyze training and validation metric patterns to diagnose overfitting and gradient stability issues using TensorBoard visualization tools.

What's included

2 videos1 reading1 assignment1 ungraded lab

You will implement targeted interventions including gradient clipping and early stopping to stabilize training processes and prevent common neural network training failures.

What's included

1 video1 reading3 assignments

You will learn systematic image preprocessing techniques including normalization and color-space conversions to prepare raw visual data for computer vision applications.

What's included

3 videos1 reading1 assignment1 ungraded lab

You will learn optical flow and frame differencing techniques to extract temporal motion features from video sequences for computer vision applications.

What's included

2 videos1 reading2 assignments

You will establish foundational understanding of systematic error analysis approaches and learn to evaluate computer vision model performance beyond basic accuracy metrics.

What's included

2 videos1 reading1 assignment1 ungraded lab

You will apply advanced techniques to identify systematic failure patterns in computer vision models and generate comprehensive quality reports for model improvement.

What's included

1 video1 reading3 assignments

You will build foundational understanding of cross-modal retrieval systems and implement approximate nearest-neighbor search algorithms using FAISS for production-scale similarity search across multimodal embeddings.

What's included

1 video2 readings1 assignment1 ungraded lab

You will design and implement sophisticated attention-based fusion algorithms that intelligently combine visual and textual embeddings, mastering the creation of multimodal neural architectures for advanced cross-modal AI applications.

What's included

2 readings3 assignments

You will learn the foundational concepts of computational complexity analysis, learning to systematically evaluate fusion algorithms using Big O notation and profiling tools.

What's included

3 videos1 reading1 assignment1 ungraded lab

You will apply complexity analysis skills to make strategic optimization decisions, evaluating trade-offs between performance, accuracy, and resource constraints in real-world deployment scenarios.

What's included

1 video3 assignments

You will learn the systematic evaluation of production ML models to identify performance degradation and implement drift detection systems that automatically trigger remediation actions.

What's included

1 video1 reading1 assignment1 ungraded lab

You will build comprehensive automated ML pipelines with integrated hyperparameter optimization and end-to-end automation that maintains model performance in production environments.

What's included

2 videos1 reading3 assignments

You will build foundational skills for systematically analyzing multimodal AI model outputs, understanding cross-modal relationships, and preparing technical findings for stakeholder communication.

What's included

2 videos1 reading1 assignment1 ungraded lab

You will learn the critical skills of translating complex multimodal AI analysis into compelling business narratives, creating executive-level presentations, and developing stakeholder communication frameworks that drive strategic decisions.

What's included

2 videos1 reading3 assignments

You will design and implement versioned API endpoints specifically optimized for multimodal AI inference workloads

What's included

3 videos1 reading2 assignments

You will implement comprehensive OAuth2 authentication systems and observability middleware for production API services

What's included

2 videos1 reading2 assignments

You will create comprehensive OpenAPI specifications that enable automated testing, client generation, and seamless integration

What's included

2 videos1 reading2 assignments1 ungraded lab

You will build a production-grade multimodal AI system that processes visual and textual data, integrating fine-tuning, cross-modal fusion, and deployment-ready inference services.This capstone synthesizes model optimization, data engineering, API design, and MLOps practices to deliver a deployable, monitored multimodal application.

What's included

4 readings1 assignment

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Professionals from the Industry
290 Courses 43,476 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions

¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.