Ollama Tutorial: Running LLMs Locally Made Super Simple

Want to run large language models on your machine? Learn how to do so using Ollama in this quick tutorial.



 

ollama-tutorial
Image by Author

 

Running large language models (LLMs) locally can be super helpful—whether you'd like to play around with LLMs or build more powerful apps using them. But configuring your working environment and getting LLMs to run on your machine is not trivial.

So how do you run LLMs locally without any of the hassle? Enter Ollama, a platform that makes local development with open-source large language models a breeze. With Ollama, everything you need to run an LLM—model weights and all of the config—is packaged into a single Modelfile. Think Docker for LLMs.

In this tutorial, we’ll take a look at how to get started with Ollama to run large language models locally. So let’s get right into the steps!

 

Step 1: Download Ollama to Get Started

 

As a first step, you should download Ollama to your machine. Ollama is supported on all major platforms: MacOS, Windows, and Linux.

To download Ollama, you can either visit the official GitHub repo and follow the download links from there. Or visit the official website and download the installer if you are on a Mac or a Windows machine.

I’m on Linux: Ubuntu distro. So if you’re a Linux user like me, you can run the following command to run the installer script:

$ curl -fsSL https://ollama.com/install.sh | sh

 

The installation process typically takes a few minutes. During the installation process, any NVIDIA/AMD GPUs will be auto-detected. Make sure you have the drivers installed. The CPU-only mode works fine, too. But it may be much slower.

 

Step 2: Get the Model

 

Next, you can visit the model library to check the list of all model families currently supported. The default model downloaded is the one with the latest tag. On the page for each model, you can get more info such as the size and quantization used.

You can search through the list of tags to locate the model that you want to run. For each model family, there are typically foundational models of different sizes and instruction-tuned variants. I’m interested in running the Gemma 2B model from the Gemma family of lightweight models from Google DeepMind.

You can run the model using the ollama run command to pull and start interacting with the model directly. However, you can also pull the model onto your machine first and then run it. This is very similar to how you work with Docker images.

For Gemma 2B, running the following pull command downloads the model onto your machine:

$ ollama pull gemma:2b

 

The model is of size 1.7B and the pull should take a minute or two:

 
ollama-pull
 

Step 3: Run the Model

 

Run the model using the ollama run command as shown:

$ ollama run gemma:2b

 

Doing so will start an Ollama REPL at which you can interact with the Gemma 2B model. Here’s an example:

 
ollama-response
 

For a simple question about the Python standard library, the response seems pretty okay. And includes most frequently used modules.

 

Step 4: Customize Model Behavior with System Prompts

 

You can customize LLMs by setting system prompts for a specific desired behavior like so:

  • Set system prompt for desired behavior.
  • Save the model by giving it a name.
  • Exit the REPL and run the model you just created.

Say you want the model to always explain concepts or answer questions in plain English with minimal technical jargon as possible. Here’s how you can go about doing it:

>>> /set system For all questions asked answer in plain English avoiding technical jargon as much as possible
Set system message.
>>> /save ipe
Created new model 'ipe'
>>> /bye

 

Now run the model you just created:

$ ollama run ipe

 

Here’s an example:

 
ipe-response
 

Step 5: Use Ollama with Python

 

Running the Ollama command-line client and interacting with LLMs locally at the Ollama REPL is a good start. But often you would want to use LLMs in your applications. You can run Ollama as a server on your machine and run cURL requests.

But there are simpler ways. If you like using Python, you’d want to build LLM apps and here are a couple ways you can do it:

  • Using the official Ollama Python library
  • Using Ollama with LangChain

Pull the models you need to use before you run the snippets in the following sections.
 

Using the Ollama Python Library

 
You can use the Ollama Python library you can install it using pip like so:

$ pip install ollama

 

There is an official JavaScript library too, which you can use if you prefer developing with JS.

Once you install the Ollama Python library, you can import it in your Python application and work with large language models. Here's the snippet for a simple language generation task:

import ollama

response = ollama.generate(model='gemma:2b',
prompt='what is a qubit?')
print(response['response'])

 

Using LangChain

 
Another way to use Ollama with Python is using LangChain. If you have existing projects using LangChain it's easy to integrate or switch to Ollama.

Make sure you have LangChain installed. If not, install it using pip:

$ pip install langchain

 

Here's an example:

from langchain_community.llms import Ollama

llm = Ollama(model="llama2")

llm.invoke("tell me about partial functions in python")

 

Using LLMs like this in Python apps makes it easier to switch between different LLMs depending on the application.

 

Wrapping Up

 

With Ollama you can run large language models locally and build LLM-powered apps with just a few lines of Python code. Here we explored how to interact with LLMs at the Ollama REPL as well as from within Python applications.

Next we'll try building an app using Ollama and Python. Until then, if you’re looking to dive deep into LLMs check out 7 Steps to Mastering Large Language Models (LLMs).

 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.