davideisinger.com/static/archive/nshipster-com-b3vpys.txt

[1]

[2]Ollama

Written by [3]Mattt February 14^th, 2025


    “Only Apple can do this” Variously attributed to Tim Cook

Apple introduced [4]Apple Intelligence at WWDC 2024. After waiting almost a
year for Apple to, in Craig Federighi’s words, “get it right”, its promise of
“AI for the rest of us” feels just as distant as ever.

Can we take a moment to appreciate the name? Apple Intelligence. AI. That’s
some S-tier semantic appropriation. On the level of jumping on “podcast” before
anyone knew what else to call that.

While we wait for Apple Intelligence to arrive on our devices, something
remarkable is already running on our Macs. Think of it as a locavore approach
to artificial intelligence: homegrown, sustainable, and available year-round.

This week on NSHipster, we’ll look at how you can use Ollama to run LLMs
locally on your Mac — both as an end-user and as a developer.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[5]What is Ollama?

Ollama is the easiest way to run large language models on your Mac. You can
think of it as “Docker for LLMs” - a way to pull, run, and manage AI models as
easily as containers.

Download Ollama with [6]Homebrew or directly from [7]their website. Then pull
and run [8]llama3.2 (2GB).

$ brew install --cask ollama
$ ollama run llama3.2
>>> Tell me a joke about Swift programming.
What's a Apple developer's favorite drink?
The Kool-Aid.

Under the hood, Ollama is powered by [9]llama.cpp. But where llama.cpp provides
the engine, Ollama gives you a vehicle you’d actually want to drive — handling
all the complexity of model management, optimization, and inference.

Similar to how Dockerfiles define container images, Ollama uses Modelfiles to
configure model behavior:

FROM mistral:latest
PARAMETER temperature 0.7
TEMPLATE """
You are a helpful assistant.

User:
Assistant: """

Ollama uses the [10]Open Container Initiative (OCI) standard to distribute
models. Each model is split into layers and described by a manifest, the same
approach used by Docker containers:

{
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.ollama.image.config.v1+json",
    "digest": "sha256:..."
  },
  "layers": [
    {
      "mediaType": "application/vnd.ollama.image.layer.v1+json",
      "digest": "sha256:...",
      "size": 4019248935
    }
  ]
}

Overall, Ollama’s approach is thoughtful and well-engineered. And best of all,
it just works.

[11]What’s the big deal about running models locally?

[12]Jevons paradox states that, as something becomes more efficient, we tend to
use more of it, not less.

Having AI on your own device changes everything. When computation becomes
essentially free, you start to see intelligence differently.

While frontier models like GPT-4 and Claude are undeniably miraculous, there’s
something to be said for the small miracle of running open models locally.

  • Privacy: Your data never leaves your device. Essential for working with
    sensitive information.
  • Cost: Run 24/7 without usage meters ticking. No more rationing prompts like
    ’90s cell phone minutes. Just a fixed, up-front cost for unlimited
    inference.
  • Latency: No network round-trips means faster responses. Your /M\d Mac((Book
    ( Pro| Air)?)|Mini|Studio)/ can easily generate dozens of tokens per
    second. (Try to keep up!)
  • Control: No black-box [13]RLHF or censorship. The AI works for you, not the
    other way around.
  • Reliability: No outages or API quota limits. 100% uptime for your [14]
    exocortex. Like having Wikipedia on a thumb drive.

[15]Building macOS Apps with Ollama

Ollama also exposes an [16]HTTP API on port 11431 ([17]leetspeak for llama 🦙).
This makes it easy to integrate with any programming language or tool.

To that end, we’ve created the [18]Ollama Swift package to help developers
integrate Ollama into their apps.

[19]Text Completions

The simplest way to use a language model is to generate text from a prompt:

import Ollama

let client = Client.default
let response = try await client.generate(
    model: "llama3.2",
    prompt: "Tell me a joke about Swift programming.",
    options: ["temperature": 0.7]
)
print(response.response)
// How many Apple engineers does it take to document an API?
// None - that's what WWDC videos are for.

[20]Chat Completions

For more structured interactions, you can use the chat API to maintain a
conversation with multiple messages and different roles:

let initialResponse = try await client.chat(
    model: "llama3.2",
    messages: [
        .system("You are a helpful assistant."),
        .user("What city is Apple located in?")
    ]
)
print(initialResponse.message.content)
// Apple's headquarters, known as the Apple Park campus, is located in Cupertino, California.
// The company was originally founded in Los Altos, California, and later moved to Cupertino in 1997.

let followUp = try await client.chat(
    model: "llama3.2",
    messages: [
        .system("You are a helpful assistant."),
        .user("What city is Apple located in?"),
        .assistant(initialResponse.message.content),
        .user("Please summarize in a single word")
    ]
)
print(followUp.message.content)
// Cupertino

[21]Generating text embeddings

[22]Embeddings convert text into high-dimensional vectors that capture semantic
meaning. These vectors can be used to find similar content or perform semantic
search.

For example, if you wanted to find documents similar to a user’s query:

let documents: [String] = …

// Convert text into vectors we can compare for similarity
let embeddings = try await client.embeddings(
    model: "nomic-embed-text",
    texts: documents
)

/// Finds relevant documents
func findRelevantDocuments(
    for query: String,
    threshold: Float = 0.7, // cutoff for matching, tunable
    limit: Int = 5
) async throws -> [String] {
    // Get embedding for the query
    let [queryEmbedding] = try await client.embeddings(
        model: "llama3.2",
        texts: [query]
    )

    // See: https://en.wikipedia.org/wiki/Cosine_similarity
    func cosineSimilarity(_ a: [Float], _ b: [Float]) -> Float {
        let dotProduct = zip(a, b).map(*).reduce(0, +)
        let magnitude = { sqrt($0.map { $0 * $0 }.reduce(0, +)) }
        return dotProduct / (magnitude(a) * magnitude(b))
    }

    // Find documents above similarity threshold
    let rankedDocuments = zip(embeddings, documents)
        .map { embedding, document in
            (similarity: cosineSimilarity(embedding, queryEmbedding),
             document: document)
        }
        .filter { $0.similarity >= threshold }
        .sorted { $0.similarity > $1.similarity }
        .prefix(limit)

    return rankedDocuments.map(\.document)
}

For simple use cases, you can also use Apple’s [23]Natural Language framework
for text embeddings. They’re fast and don’t require additional dependencies.

import NaturalLanguage

let embedding = NLEmbedding.wordEmbedding(for: .english)
let vector = embedding?.vector(for: "swift")

[24]Building a RAG System

Embeddings really shine when combined with text generation in a RAG (Retrieval
Augmented Generation) workflow. Instead of asking the model to generate
information from its training data, we can ground its responses in our own
documents by:

 1. Converting documents into embeddings
 2. Finding relevant documents based on the query
 3. Using those documents as context for generation

Here’s a simple example:

let query = "What were AAPL's earnings in Q3 2024?"
let relevantDocs = try await findRelevantDocuments(query: query)
let context = """
    Use the following documents to answer the question.
    If the answer isn't contained in the documents, say so.

    Documents:
    \(relevantDocs.joined(separator: "\n---\n"))

    Question: \(query)
    """

let response = try await client.generate(
    model: "llama3.2",
    prompt: context
)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

To summarize: Different models have different capabilities.

  • Models like [25]llama3.2 and [26]deepseek-r1 generate text.
      □ Some text models have “base” or “instruct” variants, suitable for
        fine-tuning or chat completion, respectively.
      □ Some text models are tuned to support [27]tool use, which let them
        perform more complex tasks and interact with the outside world.
  • Models like [28]llama3.2-vision can take images along with text as inputs.

  • Models like [29]nomic-embed-text create numerical vectors that capture
    semantic meaning.

With Ollama, you get unlimited access to a wealth of these and many more
open-source language models.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

So, what can you build with all of this?
Here’s just one example:

[30]Nominate.app

[31]Nominate is a macOS app that uses Ollama to intelligently rename PDF files
based on their contents.

Like many of us striving for a paperless lifestyle, you might find yourself
scanning documents only to end up with cryptically-named PDFs like
Scan2025-02-03_123456.pdf. Nominate solves this by combining AI with
traditional NLP techniques to automatically generate descriptive filenames
based on document contents.


The app leverages several technologies we’ve discussed:

  • Ollama’s API for content analysis via the ollama-swift package
  • Apple’s PDFKit for OCR
  • The Natural Language framework for text processing
  • Foundation’s DateFormatter for parsing dates

Nominate performs all processing locally. Your documents never leave your
computer. This is a key advantage of running models locally versus using cloud
APIs.

[32]Looking Ahead

    “The future is already here – it’s just not evenly distributed yet.”
    William Gibson

Think about the timelines:

  • Apple Intelligence was announced last year.
  • Swift came out 10 years ago.
  • SwiftUI 6 years ago.

If you wait for Apple to deliver on its promises, you’re going to miss out on
the most important technological shift in a generation.

The future is here today. You don’t have to wait. With Ollama, you can start
building the next generation of AI-powered apps right now.

NSMutableHipster

Questions? Corrections? [33]Issues and [34]pull requests are always welcome.

This article uses Swift version 6.0. Find status information for all articles
on the [35]status page.

Written by Mattt
[36]Mattt

[37]Mattt ([38]@mattt) is a writer and developer in Portland, Oregon.

🅭 🅯 🄏 NSHipster.com is released under a [39]Creative Commons BY-NC License.


References:

[1] https://nshipster.com/
[2] https://nshipster.com/ollama/
[3] https://nshipster.com/authors/mattt/
[4] https://www.apple.com/apple-intelligence/
[5] https://nshipster.com/ollama/#what-is-ollama
[6] https://brew.sh/
[7] https://ollama.com/download
[8] https://ollama.com/library/llama3.2
[9] https://github.com/ggerganov/llama.cpp
[10] https://opencontainers.org/
[11] https://nshipster.com/ollama/#whats-the-big-deal-about-running-models-locally
[12] https://en.wikipedia.org/wiki/Jevons_paradox
[13] https://knowyourmeme.com/photos/2546581-shoggoth-with-smiley-face-artificial-intelligence
[14] https://en.wiktionary.org/wiki/exocortex
[15] https://nshipster.com/ollama/#building-macos-apps-with-ollama
[16] https://github.com/ollama/ollama/blob/main/docs/api.md
[17] https://en.wikipedia.org/wiki/Leet
[18] https://github.com/mattt/ollama-swift
[19] https://nshipster.com/ollama/#text-completions
[20] https://nshipster.com/ollama/#chat-completions
[21] https://nshipster.com/ollama/#generating-text-embeddings
[22] https://en.wikipedia.org/wiki/Word_embedding
[23] https://developer.apple.com/documentation/naturallanguage/
[24] https://nshipster.com/ollama/#building-a-rag-system
[25] https://ollama.com/library/llama3.2
[26] https://ollama.com/library/deepseek-r1
[27] https://ollama.com/blog/tool-support
[28] https://ollama.com/library/llama3.2-vision
[29] https://ollama.com/library/nomic-embed-text
[30] https://nshipster.com/ollama/#nominateapp
[31] https://github.com/nshipster/nominate
[32] https://nshipster.com/ollama/#looking-ahead
[33] https://github.com/NSHipster/articles/issues
[34] https://github.com/NSHipster/articles/blob/master/2025-02-14-ollama.md
[35] https://nshipster.com/status/
[36] https://nshipster.com/authors/mattt/
[37] https://github.com/mattt
[38] https://twitter.com/mattt
[39] https://creativecommons.org/licenses/by-nc/4.0/