


Inference techniques for local and cloud LLM deployment
This course is part of Building Generative AI Apps with Llama Professional Certificate

Instructor: Taught by Meta Staff
Included with 
Recommended experience
What you'll learn
- The principles of LLM inference and prompt pipelines for real-world tasks. 
- Running small and medium LLMs locally with Ollama and deploying larger models in the cloud using Python. 
- Building and documenting LLM-powered tools ready for real-world use. 
Details to know

Add to your LinkedIn profile
See how employees at top companies are mastering in-demand skills

Build your Software Development expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate from Meta

Explore more from Software Development
Why people choose Coursera for their career





Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Developers, entrepreneurs, and technical professionals with 1-2 years of Python knowledge who want to build AI-enabled assistants. Ideal for those looking to upskill in generative AI development or create practical business solutions using Llama models.
1-2 years of Python programming experience (for those who need to meet this prerequisite, start with the Meta Programming in Python course)
Familiarity with command-line interfaces
Understanding of basic software development concepts
Basic knowledge of REST APIs
In this program, you will be guided to access the Llama 4 Scout 17B and Llama 3.1 8B models via API in Courses 1 and 2. The course content includes examples using one of the API providers, but you are free to choose any provider that offers access to Llama models for your learning experience. Examples of such providers include Together AI, Groq, Hugging Face, and others.
In Course 3, you will be guided to use the Llama 3.1 8B model in a local environment. Llama models are available from multiple sources, with the best place to start being https://www.llama.com.
Models are also hosted and distributed by partners such as Amazon Web Services, Microsoft Azure, Google Cloud, IBM Watsonx, Oracle Cloud, Snowflake, Databricks, Dell, Hugging Face, Groq, Cerebras, SambaNova, and many others. See the Llama.com FAQ for more information.
More questions
Financial aid available,

