359 lines
13 KiB
Plaintext
359 lines
13 KiB
Plaintext
[1]
|
||
|
||
[2]Ollama
|
||
|
||
Written by [3]Mattt February 14^th, 2025
|
||
|
||
|
||
“Only Apple can do this” Variously attributed to Tim Cook
|
||
|
||
Apple introduced [4]Apple Intelligence at WWDC 2024. After waiting almost a
|
||
year for Apple to, in Craig Federighi’s words, “get it right”, its promise of
|
||
“AI for the rest of us” feels just as distant as ever.
|
||
|
||
Can we take a moment to appreciate the name? Apple Intelligence. AI. That’s
|
||
some S-tier semantic appropriation. On the level of jumping on “podcast” before
|
||
anyone knew what else to call that.
|
||
|
||
While we wait for Apple Intelligence to arrive on our devices, something
|
||
remarkable is already running on our Macs. Think of it as a locavore approach
|
||
to artificial intelligence: homegrown, sustainable, and available year-round.
|
||
|
||
This week on NSHipster, we’ll look at how you can use Ollama to run LLMs
|
||
locally on your Mac — both as an end-user and as a developer.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
[5]What is Ollama?
|
||
|
||
Ollama is the easiest way to run large language models on your Mac. You can
|
||
think of it as “Docker for LLMs” - a way to pull, run, and manage AI models as
|
||
easily as containers.
|
||
|
||
Download Ollama with [6]Homebrew or directly from [7]their website. Then pull
|
||
and run [8]llama3.2 (2GB).
|
||
|
||
$ brew install --cask ollama
|
||
$ ollama run llama3.2
|
||
>>> Tell me a joke about Swift programming.
|
||
What's a Apple developer's favorite drink?
|
||
The Kool-Aid.
|
||
|
||
Under the hood, Ollama is powered by [9]llama.cpp. But where llama.cpp provides
|
||
the engine, Ollama gives you a vehicle you’d actually want to drive — handling
|
||
all the complexity of model management, optimization, and inference.
|
||
|
||
Similar to how Dockerfiles define container images, Ollama uses Modelfiles to
|
||
configure model behavior:
|
||
|
||
FROM mistral:latest
|
||
PARAMETER temperature 0.7
|
||
TEMPLATE """
|
||
You are a helpful assistant.
|
||
|
||
User:
|
||
Assistant: """
|
||
|
||
Ollama uses the [10]Open Container Initiative (OCI) standard to distribute
|
||
models. Each model is split into layers and described by a manifest, the same
|
||
approach used by Docker containers:
|
||
|
||
{
|
||
"mediaType": "application/vnd.oci.image.manifest.v1+json",
|
||
"config": {
|
||
"mediaType": "application/vnd.ollama.image.config.v1+json",
|
||
"digest": "sha256:..."
|
||
},
|
||
"layers": [
|
||
{
|
||
"mediaType": "application/vnd.ollama.image.layer.v1+json",
|
||
"digest": "sha256:...",
|
||
"size": 4019248935
|
||
}
|
||
]
|
||
}
|
||
|
||
Overall, Ollama’s approach is thoughtful and well-engineered. And best of all,
|
||
it just works.
|
||
|
||
[11]What’s the big deal about running models locally?
|
||
|
||
[12]Jevons paradox states that, as something becomes more efficient, we tend to
|
||
use more of it, not less.
|
||
|
||
Having AI on your own device changes everything. When computation becomes
|
||
essentially free, you start to see intelligence differently.
|
||
|
||
While frontier models like GPT-4 and Claude are undeniably miraculous, there’s
|
||
something to be said for the small miracle of running open models locally.
|
||
|
||
• Privacy: Your data never leaves your device. Essential for working with
|
||
sensitive information.
|
||
• Cost: Run 24/7 without usage meters ticking. No more rationing prompts like
|
||
’90s cell phone minutes. Just a fixed, up-front cost for unlimited
|
||
inference.
|
||
• Latency: No network round-trips means faster responses. Your /M\d Mac((Book
|
||
( Pro| Air)?)|Mini|Studio)/ can easily generate dozens of tokens per
|
||
second. (Try to keep up!)
|
||
• Control: No black-box [13]RLHF or censorship. The AI works for you, not the
|
||
other way around.
|
||
• Reliability: No outages or API quota limits. 100% uptime for your [14]
|
||
exocortex. Like having Wikipedia on a thumb drive.
|
||
|
||
[15]Building macOS Apps with Ollama
|
||
|
||
Ollama also exposes an [16]HTTP API on port 11431 ([17]leetspeak for llama 🦙).
|
||
This makes it easy to integrate with any programming language or tool.
|
||
|
||
To that end, we’ve created the [18]Ollama Swift package to help developers
|
||
integrate Ollama into their apps.
|
||
|
||
[19]Text Completions
|
||
|
||
The simplest way to use a language model is to generate text from a prompt:
|
||
|
||
import Ollama
|
||
|
||
let client = Client.default
|
||
let response = try await client.generate(
|
||
model: "llama3.2",
|
||
prompt: "Tell me a joke about Swift programming.",
|
||
options: ["temperature": 0.7]
|
||
)
|
||
print(response.response)
|
||
// How many Apple engineers does it take to document an API?
|
||
// None - that's what WWDC videos are for.
|
||
|
||
[20]Chat Completions
|
||
|
||
For more structured interactions, you can use the chat API to maintain a
|
||
conversation with multiple messages and different roles:
|
||
|
||
let initialResponse = try await client.chat(
|
||
model: "llama3.2",
|
||
messages: [
|
||
.system("You are a helpful assistant."),
|
||
.user("What city is Apple located in?")
|
||
]
|
||
)
|
||
print(initialResponse.message.content)
|
||
// Apple's headquarters, known as the Apple Park campus, is located in Cupertino, California.
|
||
// The company was originally founded in Los Altos, California, and later moved to Cupertino in 1997.
|
||
|
||
let followUp = try await client.chat(
|
||
model: "llama3.2",
|
||
messages: [
|
||
.system("You are a helpful assistant."),
|
||
.user("What city is Apple located in?"),
|
||
.assistant(initialResponse.message.content),
|
||
.user("Please summarize in a single word")
|
||
]
|
||
)
|
||
print(followUp.message.content)
|
||
// Cupertino
|
||
|
||
[21]Generating text embeddings
|
||
|
||
[22]Embeddings convert text into high-dimensional vectors that capture semantic
|
||
meaning. These vectors can be used to find similar content or perform semantic
|
||
search.
|
||
|
||
For example, if you wanted to find documents similar to a user’s query:
|
||
|
||
let documents: [String] = …
|
||
|
||
// Convert text into vectors we can compare for similarity
|
||
let embeddings = try await client.embeddings(
|
||
model: "nomic-embed-text",
|
||
texts: documents
|
||
)
|
||
|
||
/// Finds relevant documents
|
||
func findRelevantDocuments(
|
||
for query: String,
|
||
threshold: Float = 0.7, // cutoff for matching, tunable
|
||
limit: Int = 5
|
||
) async throws -> [String] {
|
||
// Get embedding for the query
|
||
let [queryEmbedding] = try await client.embeddings(
|
||
model: "llama3.2",
|
||
texts: [query]
|
||
)
|
||
|
||
// See: https://en.wikipedia.org/wiki/Cosine_similarity
|
||
func cosineSimilarity(_ a: [Float], _ b: [Float]) -> Float {
|
||
let dotProduct = zip(a, b).map(*).reduce(0, +)
|
||
let magnitude = { sqrt($0.map { $0 * $0 }.reduce(0, +)) }
|
||
return dotProduct / (magnitude(a) * magnitude(b))
|
||
}
|
||
|
||
// Find documents above similarity threshold
|
||
let rankedDocuments = zip(embeddings, documents)
|
||
.map { embedding, document in
|
||
(similarity: cosineSimilarity(embedding, queryEmbedding),
|
||
document: document)
|
||
}
|
||
.filter { $0.similarity >= threshold }
|
||
.sorted { $0.similarity > $1.similarity }
|
||
.prefix(limit)
|
||
|
||
return rankedDocuments.map(\.document)
|
||
}
|
||
|
||
For simple use cases, you can also use Apple’s [23]Natural Language framework
|
||
for text embeddings. They’re fast and don’t require additional dependencies.
|
||
|
||
import NaturalLanguage
|
||
|
||
let embedding = NLEmbedding.wordEmbedding(for: .english)
|
||
let vector = embedding?.vector(for: "swift")
|
||
|
||
[24]Building a RAG System
|
||
|
||
Embeddings really shine when combined with text generation in a RAG (Retrieval
|
||
Augmented Generation) workflow. Instead of asking the model to generate
|
||
information from its training data, we can ground its responses in our own
|
||
documents by:
|
||
|
||
1. Converting documents into embeddings
|
||
2. Finding relevant documents based on the query
|
||
3. Using those documents as context for generation
|
||
|
||
Here’s a simple example:
|
||
|
||
let query = "What were AAPL's earnings in Q3 2024?"
|
||
let relevantDocs = try await findRelevantDocuments(query: query)
|
||
let context = """
|
||
Use the following documents to answer the question.
|
||
If the answer isn't contained in the documents, say so.
|
||
|
||
Documents:
|
||
\(relevantDocs.joined(separator: "\n---\n"))
|
||
|
||
Question: \(query)
|
||
"""
|
||
|
||
let response = try await client.generate(
|
||
model: "llama3.2",
|
||
prompt: context
|
||
)
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
To summarize: Different models have different capabilities.
|
||
|
||
• Models like [25]llama3.2 and [26]deepseek-r1 generate text.
|
||
□ Some text models have “base” or “instruct” variants, suitable for
|
||
fine-tuning or chat completion, respectively.
|
||
□ Some text models are tuned to support [27]tool use, which let them
|
||
perform more complex tasks and interact with the outside world.
|
||
• Models like [28]llama3.2-vision can take images along with text as inputs.
|
||
|
||
• Models like [29]nomic-embed-text create numerical vectors that capture
|
||
semantic meaning.
|
||
|
||
With Ollama, you get unlimited access to a wealth of these and many more
|
||
open-source language models.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
So, what can you build with all of this?
|
||
Here’s just one example:
|
||
|
||
[30]Nominate.app
|
||
|
||
[31]Nominate is a macOS app that uses Ollama to intelligently rename PDF files
|
||
based on their contents.
|
||
|
||
Like many of us striving for a paperless lifestyle, you might find yourself
|
||
scanning documents only to end up with cryptically-named PDFs like
|
||
Scan2025-02-03_123456.pdf. Nominate solves this by combining AI with
|
||
traditional NLP techniques to automatically generate descriptive filenames
|
||
based on document contents.
|
||
|
||
|
||
The app leverages several technologies we’ve discussed:
|
||
|
||
• Ollama’s API for content analysis via the ollama-swift package
|
||
• Apple’s PDFKit for OCR
|
||
• The Natural Language framework for text processing
|
||
• Foundation’s DateFormatter for parsing dates
|
||
|
||
Nominate performs all processing locally. Your documents never leave your
|
||
computer. This is a key advantage of running models locally versus using cloud
|
||
APIs.
|
||
|
||
[32]Looking Ahead
|
||
|
||
“The future is already here – it’s just not evenly distributed yet.”
|
||
William Gibson
|
||
|
||
Think about the timelines:
|
||
|
||
• Apple Intelligence was announced last year.
|
||
• Swift came out 10 years ago.
|
||
• SwiftUI 6 years ago.
|
||
|
||
If you wait for Apple to deliver on its promises, you’re going to miss out on
|
||
the most important technological shift in a generation.
|
||
|
||
The future is here today. You don’t have to wait. With Ollama, you can start
|
||
building the next generation of AI-powered apps right now.
|
||
|
||
NSMutableHipster
|
||
|
||
Questions? Corrections? [33]Issues and [34]pull requests are always welcome.
|
||
|
||
This article uses Swift version 6.0. Find status information for all articles
|
||
on the [35]status page.
|
||
|
||
Written by Mattt
|
||
[36]Mattt
|
||
|
||
[37]Mattt ([38]@mattt) is a writer and developer in Portland, Oregon.
|
||
|
||
🅭 🅯 🄏 NSHipster.com is released under a [39]Creative Commons BY-NC License.
|
||
|
||
|
||
References:
|
||
|
||
[1] https://nshipster.com/
|
||
[2] https://nshipster.com/ollama/
|
||
[3] https://nshipster.com/authors/mattt/
|
||
[4] https://www.apple.com/apple-intelligence/
|
||
[5] https://nshipster.com/ollama/#what-is-ollama
|
||
[6] https://brew.sh/
|
||
[7] https://ollama.com/download
|
||
[8] https://ollama.com/library/llama3.2
|
||
[9] https://github.com/ggerganov/llama.cpp
|
||
[10] https://opencontainers.org/
|
||
[11] https://nshipster.com/ollama/#whats-the-big-deal-about-running-models-locally
|
||
[12] https://en.wikipedia.org/wiki/Jevons_paradox
|
||
[13] https://knowyourmeme.com/photos/2546581-shoggoth-with-smiley-face-artificial-intelligence
|
||
[14] https://en.wiktionary.org/wiki/exocortex
|
||
[15] https://nshipster.com/ollama/#building-macos-apps-with-ollama
|
||
[16] https://github.com/ollama/ollama/blob/main/docs/api.md
|
||
[17] https://en.wikipedia.org/wiki/Leet
|
||
[18] https://github.com/mattt/ollama-swift
|
||
[19] https://nshipster.com/ollama/#text-completions
|
||
[20] https://nshipster.com/ollama/#chat-completions
|
||
[21] https://nshipster.com/ollama/#generating-text-embeddings
|
||
[22] https://en.wikipedia.org/wiki/Word_embedding
|
||
[23] https://developer.apple.com/documentation/naturallanguage/
|
||
[24] https://nshipster.com/ollama/#building-a-rag-system
|
||
[25] https://ollama.com/library/llama3.2
|
||
[26] https://ollama.com/library/deepseek-r1
|
||
[27] https://ollama.com/blog/tool-support
|
||
[28] https://ollama.com/library/llama3.2-vision
|
||
[29] https://ollama.com/library/nomic-embed-text
|
||
[30] https://nshipster.com/ollama/#nominateapp
|
||
[31] https://github.com/nshipster/nominate
|
||
[32] https://nshipster.com/ollama/#looking-ahead
|
||
[33] https://github.com/NSHipster/articles/issues
|
||
[34] https://github.com/NSHipster/articles/blob/master/2025-02-14-ollama.md
|
||
[35] https://nshipster.com/status/
|
||
[36] https://nshipster.com/authors/mattt/
|
||
[37] https://github.com/mattt
|
||
[38] https://twitter.com/mattt
|
||
[39] https://creativecommons.org/licenses/by-nc/4.0/
|