Ollama

Ollama is an open-source tool for running large language models locally. It provides a simple command-line interface and API for downloading, managing, and running LLMs on your own hardware.

Overview

Ollama packages models into a single executable with all dependencies included. It supports macOS, Linux, and Windows, and uses llama.cpp for efficient inference on both CPU and GPU. The tool is designed to make local LLM access as simple as possible.

Key Features

  • Simple CLI: Pull and run models with a single command (ollama run <model>)
  • Local REST API: Built-in HTTP API compatible with the OpenAI Chat Completions format
  • Model Library: Curated list of models available via ollama pull (Llama, Mistral, Gemma, Phi, etc.)
  • Hardware Acceleration: Automatic GPU detection and acceleration via CUDA, ROCm, and Metal
  • Modelfiles: Custom model definitions with parameters, system prompts, and license info
  • Cross-Platform: Native support for macOS (including Apple Silicon), Linux, and Windows

Licensing

Ollama is open source (MIT License). The Ollama application is free to use, modify, and distribute. Individual models have their own licenses as defined by their creators.

See Ollama GitHub for license details.

Official Resources