add links
This commit is contained in:
358
static/archive/nshipster-com-b3vpys.txt
Normal file
358
static/archive/nshipster-com-b3vpys.txt
Normal file
@@ -0,0 +1,358 @@
|
||||
[1]
|
||||
|
||||
[2]Ollama
|
||||
|
||||
Written by [3]Mattt February 14^th, 2025
|
||||
|
||||
|
||||
“Only Apple can do this” Variously attributed to Tim Cook
|
||||
|
||||
Apple introduced [4]Apple Intelligence at WWDC 2024. After waiting almost a
|
||||
year for Apple to, in Craig Federighi’s words, “get it right”, its promise of
|
||||
“AI for the rest of us” feels just as distant as ever.
|
||||
|
||||
Can we take a moment to appreciate the name? Apple Intelligence. AI. That’s
|
||||
some S-tier semantic appropriation. On the level of jumping on “podcast” before
|
||||
anyone knew what else to call that.
|
||||
|
||||
While we wait for Apple Intelligence to arrive on our devices, something
|
||||
remarkable is already running on our Macs. Think of it as a locavore approach
|
||||
to artificial intelligence: homegrown, sustainable, and available year-round.
|
||||
|
||||
This week on NSHipster, we’ll look at how you can use Ollama to run LLMs
|
||||
locally on your Mac — both as an end-user and as a developer.
|
||||
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
[5]What is Ollama?
|
||||
|
||||
Ollama is the easiest way to run large language models on your Mac. You can
|
||||
think of it as “Docker for LLMs” - a way to pull, run, and manage AI models as
|
||||
easily as containers.
|
||||
|
||||
Download Ollama with [6]Homebrew or directly from [7]their website. Then pull
|
||||
and run [8]llama3.2 (2GB).
|
||||
|
||||
$ brew install --cask ollama
|
||||
$ ollama run llama3.2
|
||||
>>> Tell me a joke about Swift programming.
|
||||
What's a Apple developer's favorite drink?
|
||||
The Kool-Aid.
|
||||
|
||||
Under the hood, Ollama is powered by [9]llama.cpp. But where llama.cpp provides
|
||||
the engine, Ollama gives you a vehicle you’d actually want to drive — handling
|
||||
all the complexity of model management, optimization, and inference.
|
||||
|
||||
Similar to how Dockerfiles define container images, Ollama uses Modelfiles to
|
||||
configure model behavior:
|
||||
|
||||
FROM mistral:latest
|
||||
PARAMETER temperature 0.7
|
||||
TEMPLATE """
|
||||
You are a helpful assistant.
|
||||
|
||||
User:
|
||||
Assistant: """
|
||||
|
||||
Ollama uses the [10]Open Container Initiative (OCI) standard to distribute
|
||||
models. Each model is split into layers and described by a manifest, the same
|
||||
approach used by Docker containers:
|
||||
|
||||
{
|
||||
"mediaType": "application/vnd.oci.image.manifest.v1+json",
|
||||
"config": {
|
||||
"mediaType": "application/vnd.ollama.image.config.v1+json",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"layers": [
|
||||
{
|
||||
"mediaType": "application/vnd.ollama.image.layer.v1+json",
|
||||
"digest": "sha256:...",
|
||||
"size": 4019248935
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Overall, Ollama’s approach is thoughtful and well-engineered. And best of all,
|
||||
it just works.
|
||||
|
||||
[11]What’s the big deal about running models locally?
|
||||
|
||||
[12]Jevons paradox states that, as something becomes more efficient, we tend to
|
||||
use more of it, not less.
|
||||
|
||||
Having AI on your own device changes everything. When computation becomes
|
||||
essentially free, you start to see intelligence differently.
|
||||
|
||||
While frontier models like GPT-4 and Claude are undeniably miraculous, there’s
|
||||
something to be said for the small miracle of running open models locally.
|
||||
|
||||
• Privacy: Your data never leaves your device. Essential for working with
|
||||
sensitive information.
|
||||
• Cost: Run 24/7 without usage meters ticking. No more rationing prompts like
|
||||
’90s cell phone minutes. Just a fixed, up-front cost for unlimited
|
||||
inference.
|
||||
• Latency: No network round-trips means faster responses. Your /M\d Mac((Book
|
||||
( Pro| Air)?)|Mini|Studio)/ can easily generate dozens of tokens per
|
||||
second. (Try to keep up!)
|
||||
• Control: No black-box [13]RLHF or censorship. The AI works for you, not the
|
||||
other way around.
|
||||
• Reliability: No outages or API quota limits. 100% uptime for your [14]
|
||||
exocortex. Like having Wikipedia on a thumb drive.
|
||||
|
||||
[15]Building macOS Apps with Ollama
|
||||
|
||||
Ollama also exposes an [16]HTTP API on port 11431 ([17]leetspeak for llama 🦙).
|
||||
This makes it easy to integrate with any programming language or tool.
|
||||
|
||||
To that end, we’ve created the [18]Ollama Swift package to help developers
|
||||
integrate Ollama into their apps.
|
||||
|
||||
[19]Text Completions
|
||||
|
||||
The simplest way to use a language model is to generate text from a prompt:
|
||||
|
||||
import Ollama
|
||||
|
||||
let client = Client.default
|
||||
let response = try await client.generate(
|
||||
model: "llama3.2",
|
||||
prompt: "Tell me a joke about Swift programming.",
|
||||
options: ["temperature": 0.7]
|
||||
)
|
||||
print(response.response)
|
||||
// How many Apple engineers does it take to document an API?
|
||||
// None - that's what WWDC videos are for.
|
||||
|
||||
[20]Chat Completions
|
||||
|
||||
For more structured interactions, you can use the chat API to maintain a
|
||||
conversation with multiple messages and different roles:
|
||||
|
||||
let initialResponse = try await client.chat(
|
||||
model: "llama3.2",
|
||||
messages: [
|
||||
.system("You are a helpful assistant."),
|
||||
.user("What city is Apple located in?")
|
||||
]
|
||||
)
|
||||
print(initialResponse.message.content)
|
||||
// Apple's headquarters, known as the Apple Park campus, is located in Cupertino, California.
|
||||
// The company was originally founded in Los Altos, California, and later moved to Cupertino in 1997.
|
||||
|
||||
let followUp = try await client.chat(
|
||||
model: "llama3.2",
|
||||
messages: [
|
||||
.system("You are a helpful assistant."),
|
||||
.user("What city is Apple located in?"),
|
||||
.assistant(initialResponse.message.content),
|
||||
.user("Please summarize in a single word")
|
||||
]
|
||||
)
|
||||
print(followUp.message.content)
|
||||
// Cupertino
|
||||
|
||||
[21]Generating text embeddings
|
||||
|
||||
[22]Embeddings convert text into high-dimensional vectors that capture semantic
|
||||
meaning. These vectors can be used to find similar content or perform semantic
|
||||
search.
|
||||
|
||||
For example, if you wanted to find documents similar to a user’s query:
|
||||
|
||||
let documents: [String] = …
|
||||
|
||||
// Convert text into vectors we can compare for similarity
|
||||
let embeddings = try await client.embeddings(
|
||||
model: "nomic-embed-text",
|
||||
texts: documents
|
||||
)
|
||||
|
||||
/// Finds relevant documents
|
||||
func findRelevantDocuments(
|
||||
for query: String,
|
||||
threshold: Float = 0.7, // cutoff for matching, tunable
|
||||
limit: Int = 5
|
||||
) async throws -> [String] {
|
||||
// Get embedding for the query
|
||||
let [queryEmbedding] = try await client.embeddings(
|
||||
model: "llama3.2",
|
||||
texts: [query]
|
||||
)
|
||||
|
||||
// See: https://en.wikipedia.org/wiki/Cosine_similarity
|
||||
func cosineSimilarity(_ a: [Float], _ b: [Float]) -> Float {
|
||||
let dotProduct = zip(a, b).map(*).reduce(0, +)
|
||||
let magnitude = { sqrt($0.map { $0 * $0 }.reduce(0, +)) }
|
||||
return dotProduct / (magnitude(a) * magnitude(b))
|
||||
}
|
||||
|
||||
// Find documents above similarity threshold
|
||||
let rankedDocuments = zip(embeddings, documents)
|
||||
.map { embedding, document in
|
||||
(similarity: cosineSimilarity(embedding, queryEmbedding),
|
||||
document: document)
|
||||
}
|
||||
.filter { $0.similarity >= threshold }
|
||||
.sorted { $0.similarity > $1.similarity }
|
||||
.prefix(limit)
|
||||
|
||||
return rankedDocuments.map(\.document)
|
||||
}
|
||||
|
||||
For simple use cases, you can also use Apple’s [23]Natural Language framework
|
||||
for text embeddings. They’re fast and don’t require additional dependencies.
|
||||
|
||||
import NaturalLanguage
|
||||
|
||||
let embedding = NLEmbedding.wordEmbedding(for: .english)
|
||||
let vector = embedding?.vector(for: "swift")
|
||||
|
||||
[24]Building a RAG System
|
||||
|
||||
Embeddings really shine when combined with text generation in a RAG (Retrieval
|
||||
Augmented Generation) workflow. Instead of asking the model to generate
|
||||
information from its training data, we can ground its responses in our own
|
||||
documents by:
|
||||
|
||||
1. Converting documents into embeddings
|
||||
2. Finding relevant documents based on the query
|
||||
3. Using those documents as context for generation
|
||||
|
||||
Here’s a simple example:
|
||||
|
||||
let query = "What were AAPL's earnings in Q3 2024?"
|
||||
let relevantDocs = try await findRelevantDocuments(query: query)
|
||||
let context = """
|
||||
Use the following documents to answer the question.
|
||||
If the answer isn't contained in the documents, say so.
|
||||
|
||||
Documents:
|
||||
\(relevantDocs.joined(separator: "\n---\n"))
|
||||
|
||||
Question: \(query)
|
||||
"""
|
||||
|
||||
let response = try await client.generate(
|
||||
model: "llama3.2",
|
||||
prompt: context
|
||||
)
|
||||
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
To summarize: Different models have different capabilities.
|
||||
|
||||
• Models like [25]llama3.2 and [26]deepseek-r1 generate text.
|
||||
□ Some text models have “base” or “instruct” variants, suitable for
|
||||
fine-tuning or chat completion, respectively.
|
||||
□ Some text models are tuned to support [27]tool use, which let them
|
||||
perform more complex tasks and interact with the outside world.
|
||||
• Models like [28]llama3.2-vision can take images along with text as inputs.
|
||||
|
||||
• Models like [29]nomic-embed-text create numerical vectors that capture
|
||||
semantic meaning.
|
||||
|
||||
With Ollama, you get unlimited access to a wealth of these and many more
|
||||
open-source language models.
|
||||
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
So, what can you build with all of this?
|
||||
Here’s just one example:
|
||||
|
||||
[30]Nominate.app
|
||||
|
||||
[31]Nominate is a macOS app that uses Ollama to intelligently rename PDF files
|
||||
based on their contents.
|
||||
|
||||
Like many of us striving for a paperless lifestyle, you might find yourself
|
||||
scanning documents only to end up with cryptically-named PDFs like
|
||||
Scan2025-02-03_123456.pdf. Nominate solves this by combining AI with
|
||||
traditional NLP techniques to automatically generate descriptive filenames
|
||||
based on document contents.
|
||||
|
||||
|
||||
The app leverages several technologies we’ve discussed:
|
||||
|
||||
• Ollama’s API for content analysis via the ollama-swift package
|
||||
• Apple’s PDFKit for OCR
|
||||
• The Natural Language framework for text processing
|
||||
• Foundation’s DateFormatter for parsing dates
|
||||
|
||||
Nominate performs all processing locally. Your documents never leave your
|
||||
computer. This is a key advantage of running models locally versus using cloud
|
||||
APIs.
|
||||
|
||||
[32]Looking Ahead
|
||||
|
||||
“The future is already here – it’s just not evenly distributed yet.”
|
||||
William Gibson
|
||||
|
||||
Think about the timelines:
|
||||
|
||||
• Apple Intelligence was announced last year.
|
||||
• Swift came out 10 years ago.
|
||||
• SwiftUI 6 years ago.
|
||||
|
||||
If you wait for Apple to deliver on its promises, you’re going to miss out on
|
||||
the most important technological shift in a generation.
|
||||
|
||||
The future is here today. You don’t have to wait. With Ollama, you can start
|
||||
building the next generation of AI-powered apps right now.
|
||||
|
||||
NSMutableHipster
|
||||
|
||||
Questions? Corrections? [33]Issues and [34]pull requests are always welcome.
|
||||
|
||||
This article uses Swift version 6.0. Find status information for all articles
|
||||
on the [35]status page.
|
||||
|
||||
Written by Mattt
|
||||
[36]Mattt
|
||||
|
||||
[37]Mattt ([38]@mattt) is a writer and developer in Portland, Oregon.
|
||||
|
||||
🅭 🅯 🄏 NSHipster.com is released under a [39]Creative Commons BY-NC License.
|
||||
|
||||
|
||||
References:
|
||||
|
||||
[1] https://nshipster.com/
|
||||
[2] https://nshipster.com/ollama/
|
||||
[3] https://nshipster.com/authors/mattt/
|
||||
[4] https://www.apple.com/apple-intelligence/
|
||||
[5] https://nshipster.com/ollama/#what-is-ollama
|
||||
[6] https://brew.sh/
|
||||
[7] https://ollama.com/download
|
||||
[8] https://ollama.com/library/llama3.2
|
||||
[9] https://github.com/ggerganov/llama.cpp
|
||||
[10] https://opencontainers.org/
|
||||
[11] https://nshipster.com/ollama/#whats-the-big-deal-about-running-models-locally
|
||||
[12] https://en.wikipedia.org/wiki/Jevons_paradox
|
||||
[13] https://knowyourmeme.com/photos/2546581-shoggoth-with-smiley-face-artificial-intelligence
|
||||
[14] https://en.wiktionary.org/wiki/exocortex
|
||||
[15] https://nshipster.com/ollama/#building-macos-apps-with-ollama
|
||||
[16] https://github.com/ollama/ollama/blob/main/docs/api.md
|
||||
[17] https://en.wikipedia.org/wiki/Leet
|
||||
[18] https://github.com/mattt/ollama-swift
|
||||
[19] https://nshipster.com/ollama/#text-completions
|
||||
[20] https://nshipster.com/ollama/#chat-completions
|
||||
[21] https://nshipster.com/ollama/#generating-text-embeddings
|
||||
[22] https://en.wikipedia.org/wiki/Word_embedding
|
||||
[23] https://developer.apple.com/documentation/naturallanguage/
|
||||
[24] https://nshipster.com/ollama/#building-a-rag-system
|
||||
[25] https://ollama.com/library/llama3.2
|
||||
[26] https://ollama.com/library/deepseek-r1
|
||||
[27] https://ollama.com/blog/tool-support
|
||||
[28] https://ollama.com/library/llama3.2-vision
|
||||
[29] https://ollama.com/library/nomic-embed-text
|
||||
[30] https://nshipster.com/ollama/#nominateapp
|
||||
[31] https://github.com/nshipster/nominate
|
||||
[32] https://nshipster.com/ollama/#looking-ahead
|
||||
[33] https://github.com/NSHipster/articles/issues
|
||||
[34] https://github.com/NSHipster/articles/blob/master/2025-02-14-ollama.md
|
||||
[35] https://nshipster.com/status/
|
||||
[36] https://nshipster.com/authors/mattt/
|
||||
[37] https://github.com/mattt
|
||||
[38] https://twitter.com/mattt
|
||||
[39] https://creativecommons.org/licenses/by-nc/4.0/
|
||||
Reference in New Issue
Block a user