Add go notes
This commit is contained in:
574
static/archive/crawshaw-io-k5slfj.txt
Normal file
574
static/archive/crawshaw-io-k5slfj.txt
Normal file
@@ -0,0 +1,574 @@
|
||||
#[1]crawshaw.io atom feed
|
||||
|
||||
One process programming notes (with Go and SQLite)
|
||||
|
||||
2018 July 30
|
||||
|
||||
Blog-ified version of a talk I gave at [2]Go Northwest.
|
||||
|
||||
This content covers my recent exploration of writing internet services,
|
||||
iOS apps, and macOS programs as an indie developer.
|
||||
|
||||
There are several topics here that should each have their own blog
|
||||
post. But as I have a lot of programming to do I am going to put these
|
||||
notes up as is and split the material out some time later.
|
||||
|
||||
My focus has been on how to adapt the lessons I have learned working in
|
||||
teams at Google to a single programmer building small business work.
|
||||
There are many great engineering practices in Silicon Valleyʼs big
|
||||
companies and well-capitalized VC firms, but one person does not have
|
||||
enough bandwidth to use them all and write software. The exercise for
|
||||
me is: what to keep and what must go.
|
||||
|
||||
If I have been doing it right, the technology and techniques described
|
||||
here will sound easy. I have to fit it all in my head while having
|
||||
enough capacity left over to write software people want. Every extra
|
||||
thing has great cost, especially rarely touched software that comes
|
||||
back to bite in the middle of the night six months later.
|
||||
|
||||
Two key technologies I have decided to use are Go and SQLite.
|
||||
|
||||
A brief introduction to SQLite
|
||||
|
||||
SQLite is an implementation of SQL. Unlike traditional database
|
||||
implementations like PostgreSQL or MySQL, SQLite is a self-contained C
|
||||
library designed to be embedded into programs. It has been built by D.
|
||||
Richard Hipp since its release in 2000, and in the past 18 years other
|
||||
open source contributors have helped. At this point it has been around
|
||||
most of the time I have been programming and is a core part of my
|
||||
programming toolbox.
|
||||
|
||||
Hands-on with the SQLite command line tool
|
||||
|
||||
Rather than talk through SQLite in the abstract, let me show it to you.
|
||||
|
||||
A kind person on Kaggle has [3]provided a CSV file of the plays of
|
||||
Shakespeare. Letʼs build an SQLite database out of it.
|
||||
$ head shakespeare_data.csv
|
||||
"Dataline","Play","PlayerLinenumber","ActSceneLine","Player","PlayerLine"
|
||||
"1","Henry IV",,,,"ACT I"
|
||||
"2","Henry IV",,,,"SCENE I. London. The palace."
|
||||
"3","Henry IV",,,,"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMOR
|
||||
ELAND, SIR WALTER BLUNT, and others"
|
||||
"4","Henry IV","1","1.1.1","KING HENRY IV","So shaken as we are, so wan with car
|
||||
e,"
|
||||
"5","Henry IV","1","1.1.2","KING HENRY IV","Find we a time for frighted peace to
|
||||
pant,"
|
||||
"6","Henry IV","1","1.1.3","KING HENRY IV","And breathe short-winded accents of
|
||||
new broils"
|
||||
"7","Henry IV","1","1.1.4","KING HENRY IV","To be commenced in strands afar remo
|
||||
te."
|
||||
"8","Henry IV","1","1.1.5","KING HENRY IV","No more the thirsty entrance of this
|
||||
soil"
|
||||
"9","Henry IV","1","1.1.6","KING HENRY IV","Shall daub her lips with her own chi
|
||||
ldren's blood,"
|
||||
|
||||
First, letʼs use the sqlite command line tool to create a new database
|
||||
and import the CSV.
|
||||
$ sqlite3 shakespeare.db
|
||||
sqlite> .mode csv
|
||||
sqlite> .import shakespeare_data.csv import
|
||||
|
||||
Done! A couple of SELECTs will let us quickly see if it worked.
|
||||
sqlite> SELECT count(*) FROM import;
|
||||
111396
|
||||
sqlite> SELECT * FROM import LIMIT 10;
|
||||
1,"Henry IV","","","","ACT I"
|
||||
2,"Henry IV","","","","SCENE I. London. The palace."
|
||||
3,"Henry IV","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WES
|
||||
TMORELAND, SIR WALTER BLUNT, and others"
|
||||
4,"Henry IV",1,1.1.1,"KING HENRY IV","So shaken as we are, so wan with care,"
|
||||
5,"Henry IV",1,1.1.2,"KING HENRY IV","Find we a time for frighted peace to pant,
|
||||
"
|
||||
6,"Henry IV",1,1.1.3,"KING HENRY IV","And breathe short-winded accents of new br
|
||||
oils"
|
||||
7,"Henry IV",1,1.1.4,"KING HENRY IV","To be commenced in strands afar remote."
|
||||
8,"Henry IV",1,1.1.5,"KING HENRY IV","No more the thirsty entrance of this soil"
|
||||
9,"Henry IV",1,1.1.6,"KING HENRY IV","Shall daub her lips with her own children'
|
||||
s blood,"
|
||||
|
||||
Looks good! Now we can do a little cleanup. The original CSV contains a
|
||||
column called AceSceneLine that uses dots to encode Act number, Scene
|
||||
number, and Line number. Those would look much nicer as their own
|
||||
columns.
|
||||
sqlite> CREATE TABLE plays (rowid INTEGER PRIMARY KEY, play, linenumber, act, sc
|
||||
ene, line, player, text);
|
||||
sqlite> .schema
|
||||
CREATE TABLE import (rowid primary key, play, playerlinenumber, actsceneline, pl
|
||||
ayer, playerline);
|
||||
CREATE TABLE plays (rowid primary key, play, linenumber, act, scene, line, playe
|
||||
r, text);
|
||||
sqlite> INSERT INTO plays SELECT
|
||||
row AS rowid,
|
||||
play,
|
||||
playerlinenumber AS linenumber,
|
||||
substr(actsceneline, 1, 1) AS act,
|
||||
substr(actsceneline, 3, 1) AS scene,
|
||||
substr(actsceneline, 5, 5) AS line,
|
||||
player,
|
||||
playerline AS text
|
||||
FROM import;
|
||||
|
||||
(The substr above can be improved by using instr to find the ʼ.ʼ
|
||||
characters. Exercise left for the reader.)
|
||||
|
||||
Here we used the INSERT ... SELECT syntax to build a table out of
|
||||
another table. The ActSceneLine column was split apart using the
|
||||
builtin SQLite function substr, which slices strings.
|
||||
|
||||
The result:
|
||||
sqlite> SELECT * FROM plays LIMIT 10;
|
||||
1,"Henry IV","","","","","","ACT I"
|
||||
2,"Henry IV","","","","","","SCENE I. London. The palace."
|
||||
3,"Henry IV","","","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL
|
||||
of WESTMORELAND, SIR WALTER BLUNT, and others"
|
||||
4,"Henry IV",1,1,1,1,"KING HENRY IV","So shaken as we are, so wan with care,"
|
||||
5,"Henry IV",1,1,1,2,"KING HENRY IV","Find we a time for frighted peace to pant,
|
||||
"
|
||||
6,"Henry IV",1,1,1,3,"KING HENRY IV","And breathe short-winded accents of new br
|
||||
oils"
|
||||
7,"Henry IV",1,1,1,4,"KING HENRY IV","To be commenced in strands afar remote."
|
||||
8,"Henry IV",1,1,1,5,"KING HENRY IV","No more the thirsty entrance of this soil"
|
||||
9,"Henry IV",1,1,1,6,"KING HENRY IV","Shall daub her lips with her own children'
|
||||
s blood,"
|
||||
|
||||
Now we have our data, let us search for something:
|
||||
sqlite> SELECT * FROM plays WHERE text LIKE "whether tis nobler%";
|
||||
sqlite>
|
||||
|
||||
That did not work. Hamlet definitely says that, but perhaps the text
|
||||
formatting is slightly off. SQLite to the rescue. It ships with a Full
|
||||
Text Search extension compiled in. Let us index all of Shakespeare with
|
||||
FTS5:
|
||||
sqlite> CREATE VIRTUAL TABLE playsearch USING fts5(playsrowid, text);
|
||||
sqlite> INSERT INTO playsearch SELECT rowid, text FROM plays;
|
||||
|
||||
Now we can search for our soliloquy:
|
||||
sqlite> SELECT rowid, text FROM playsearch WHERE text MATCH "whether tis nobler"
|
||||
;
|
||||
34232|Whether 'tis nobler in the mind to suffer
|
||||
|
||||
Success! The act and scene can be acquired by joining with our original
|
||||
table.
|
||||
sqlite> SELECT play, act, scene, line, player, plays.text
|
||||
FROM playsearch
|
||||
INNER JOIN plays ON playsearch.playsrowid = plays.rowid
|
||||
WHERE playsearch.text MATCH "whether tis nobler";
|
||||
Hamlet|3|1|65|HAMLET|Whether 'tis nobler in the mind to suffer
|
||||
|
||||
Letʼs clean up.
|
||||
sqlite> DROP TABLE import;
|
||||
sqlite> VACUUM;
|
||||
|
||||
Finally, what does all of this look like on the file system?
|
||||
$ ls -l
|
||||
-rwxr-xr-x@ 1 crawshaw staff 10188854 Apr 27 2017 shakespeare_data.csv
|
||||
-rw-r--r-- 1 crawshaw staff 22286336 Jul 25 22:05 shakespeare.db
|
||||
|
||||
There you have it. The SQLite database contains two full copies of the
|
||||
plays of Shakespeare, one with a full text search index, and stores
|
||||
both of them in about twice the space it takes the original CSV file to
|
||||
store one. Not bad.
|
||||
|
||||
That should give you a feel for the i-t-e of SQLite.
|
||||
|
||||
And scene.
|
||||
|
||||
Using SQLite from Go
|
||||
|
||||
The standard database/sql
|
||||
|
||||
There are a number of cgo-based [4]database/sql drivers available for
|
||||
SQLite. The most popular one appears to be
|
||||
[5]github.com/mattn/go-sqlite3. It gets the job done and is probably
|
||||
what you want.
|
||||
|
||||
Using the database/sql package it is straightforward to open an SQLite
|
||||
database and execute SQL statements on it. For example, we can run the
|
||||
FTS query from earlier using this Go code:
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"log"
|
||||
|
||||
_ "github.com/mattn/go-sqlite3"
|
||||
)
|
||||
|
||||
func main() {
|
||||
db, err := sql.Open("sqlite3", "shakespeare.db")
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer db.Close()
|
||||
stmt, err := db.Prepare(`
|
||||
SELECT play, act, scene, plays.text
|
||||
FROM playsearch
|
||||
INNER JOIN plays ON playsearch.playrowid = plays.rowid
|
||||
WHERE playsearch.text MATCH ?;`)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
var play, text string
|
||||
var act, scene int
|
||||
err = stmt.QueryRow("whether tis nobler").Scan(&play, &act, &scene, &tex
|
||||
t)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
fmt.Printf("%s %d:%d: %q\n", play, act, scene, text)
|
||||
}
|
||||
|
||||
Executing it yields:
|
||||
Hamlet 3:1 "Whether 'tis nobler in the mind to suffer"
|
||||
|
||||
A low-level wrapper: crawshaw.io/sqlite
|
||||
|
||||
Just as SQLite steps beyond the basics of SELECT, INSERT, UPDATE,
|
||||
DELETE with full-text search, it has several other interesting features
|
||||
and extensions that cannot be accessed by SQL statements alone. These
|
||||
need specialized interfaces, and many of the interfaces are not
|
||||
supported by any of the existing drivers.
|
||||
|
||||
So I wrote my own. You can get it from [6]crawshaw.io/sqlite. In
|
||||
particular, it supports the streaming blob interface, the [7]session
|
||||
extension, and implements the necessary sqlite_unlock_notify machinery
|
||||
to make good use of the [8]shared cache for connection pools. I am
|
||||
going to cover these features through two use case studies: the client
|
||||
and the cloud.
|
||||
|
||||
cgo
|
||||
|
||||
All of these approaches rely on cgo for integrating C into Go. This is
|
||||
straightforward to do, but adds some operational complexity. Building a
|
||||
Go program using SQLite requires a C compiler for the target.
|
||||
|
||||
In practice, this means if you develop on macOS you need to install a
|
||||
cross-compiler for linux.
|
||||
|
||||
Typical concerns about the impact on software quality of adding C code
|
||||
to Go do not apply to SQLite as it has an extraordinary degree of
|
||||
testing. The quality of the code is exceptional.
|
||||
|
||||
Go and SQLite for the client
|
||||
|
||||
I am building an [9]iOS app, with almost all the code written in Go and
|
||||
the UI provided by a web view. This app has a full copy of the user
|
||||
data, it is not a thin view onto an internet server. This means storing
|
||||
a large amount of local, structured data, on-device full text
|
||||
searching, background tasks working on the database in a way that does
|
||||
not disrupt the UI, and syncing DB changes to a backup in the cloud.
|
||||
|
||||
That is a lot of moving parts for a client. More than I want to write
|
||||
in JavaScript, and more than I want to write in Swift and then have to
|
||||
promptly rewrite if I ever manage to build an Android app. More
|
||||
importantly, the server is in Go, and I am one independent developer.
|
||||
It is absolutely vital I reduce the number of moving pieces in my
|
||||
development environment to the smallest possible number. Hence the
|
||||
effort to build (the big bits) of a client using the exact same
|
||||
technology as my server.
|
||||
|
||||
The Session extension
|
||||
|
||||
The session extension lets you start a session on an SQLite connection.
|
||||
All changes made to the database through that connection are bundled
|
||||
into a patchset blob. The extension also provides method for applying
|
||||
the generated patchset to a table.
|
||||
func (conn *Conn) CreateSession(db string) (*Session, error)
|
||||
|
||||
func (s *Session) Changeset(w io.Writer) error
|
||||
|
||||
func (conn *Conn) ChangesetApply(
|
||||
r io.Reader,
|
||||
filterFn func(tableName string) bool,
|
||||
conflictFn func(ConflictType, ChangesetIter) ConflictAction,
|
||||
) error
|
||||
|
||||
This can be used to build a very simple client-sync system. Collect the
|
||||
changes made in a client, periodically bundle them up into a changeset
|
||||
and upload it to the server where it is applied to a backup copy of the
|
||||
database. If another client changes the database then the server
|
||||
advertises it to the client, who downloads a changeset and applies it.
|
||||
|
||||
This requires a bit of care in the database design. The reason I kept
|
||||
the FTS table separate in the Shakespeare example is I keep my FTS
|
||||
tables in a separate attached database (which in SQLite, means a
|
||||
different file). The cloud backup database never generates the FTS
|
||||
tables, the client is free to generate the tables in a background
|
||||
thread and they can lag behind data backups.
|
||||
|
||||
Another point of care is minimizing conflicts. The biggest one is
|
||||
AUTOINCREMENT keys. By default the primary key of a rowid table is
|
||||
incremented, which means if you have multiple clients generating rowids
|
||||
you will see lots of conflicts.
|
||||
|
||||
I have been trialing two different solutions. The first is having each
|
||||
client register a rowid range with the server and only allocate from
|
||||
its own range. It works. The second is randomly generating int64
|
||||
values, and relying on the low collision rate. So far it works too.
|
||||
Both strategies have risks, and I havenʼt decided which is better.
|
||||
|
||||
In practice, I have found I have to limit DB updates to a single
|
||||
connection to keep changeset quality high. (A changeset does not see
|
||||
changes made on other connections.) To do this I maintain a read-only
|
||||
pool of connections and a single guarded read-write connection in a
|
||||
pool of 1. The code only grabs the read-write connection when it needs
|
||||
it, and the read-only connections are enforced by the read-only bit on
|
||||
the SQLite connection.
|
||||
|
||||
Nested Transactions
|
||||
|
||||
The database/sql driver encourages the use of SQL transactions with its
|
||||
Tx type, but this does not appear to play well with nested
|
||||
transactions. This is a concept implemented by SAVEPOINT / RELEASE in
|
||||
SQL, and it makes for surprisingly composable code.
|
||||
|
||||
If a function needs to make multiple statements in a transaction, it
|
||||
can open with a SAVEPOINT, then defer a call to RELEASE if the function
|
||||
produces no Go return error, or if it does instead call ROLLBACK and
|
||||
return the error.
|
||||
func f(conn *sqlite.Conn) (err error) {
|
||||
conn...SAVEPOINT
|
||||
defer func() {
|
||||
if err == nil {
|
||||
conn...RELEASE
|
||||
} else {
|
||||
conn...ROLLBACK
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
Now if this transactional function f needs to call another
|
||||
transactional function g, then g can use exactly the same strategy and
|
||||
f can call it in a very traditional Go way:
|
||||
if err := g(conn); err != nil {
|
||||
return err // all changes in f will be rolled back by the defer
|
||||
}
|
||||
|
||||
The function g is also perfectly safe to use in its own right, as it
|
||||
has its own transaction.
|
||||
|
||||
I have been using this SAVEPOINT + defer RELEASE or return an error
|
||||
semantics for several months now and find it invaluable. It makes it
|
||||
easy to safely wrap code in SQL transactions.
|
||||
|
||||
The example above however is a bit bulky, and there are some edge cases
|
||||
that need to be handled. (For example, if the RELEASE fails, then an
|
||||
error needs to be returned.) So I have wrapped this up in a utility:
|
||||
func f(conn *sqlite.Conn) (err error) {
|
||||
defer sqlitex.Save(conn)(&err)
|
||||
|
||||
// Code is transactional and can be stacked
|
||||
// with other functions that call sqlitex.Save.
|
||||
}
|
||||
|
||||
The first time you see sqlitex.Save in action it can be a little
|
||||
off-putting, at least it was for me when I first created it. But I
|
||||
quickly got used to it, and it does a lot of heavy lifting. The first
|
||||
call to sqlitex.Save opens a SAVEPOINT on the conn and returns a
|
||||
closure that either RELEASEs or ROLLBACKs depending on the value of
|
||||
err, and sets err if necessary.
|
||||
|
||||
Go and SQLite in the cloud
|
||||
|
||||
I have spent several months now redesigning services I have encountered
|
||||
before and designing services for problems I would like to work on
|
||||
going forward. The process has led me to a general design that works
|
||||
for many problems and I quite enjoy building.
|
||||
|
||||
It can be summarized as 1 VM, 1 Zone, 1 process programming.
|
||||
|
||||
If this sounds ridiculously simplistic to you, I think thatʼs good! It
|
||||
is simple. It does not meet all sorts of requirements that we would
|
||||
like our modern fancy cloud services to meet. It is not "serverless",
|
||||
which means when a service is extremely small it does not run for free,
|
||||
and when a service grows it does not automatically scale. Indeed, there
|
||||
is an explicit scaling limit. Right now the best server you can get
|
||||
from Amazon is roughly:
|
||||
* 128 CPU threads at ~4GHz
|
||||
* 4TB RAM
|
||||
* 25 Gbit ethernet
|
||||
* 10 Gbps NAS
|
||||
* hours of yearly downtime
|
||||
|
||||
That is a huge potential downside of of one process programming.
|
||||
However, I claim that is a livable limit.
|
||||
|
||||
I claim typical services do not hit this scaling limit.
|
||||
|
||||
If you are building a small business, most products can grow and become
|
||||
profitable well under this limit for years. When you see the limit
|
||||
approaching in the next year or two, you have a business with revenue
|
||||
to hire more than one engineer, and the new team can, in the face of
|
||||
radically changing business requirements, rewrite the service.
|
||||
|
||||
Reaching this limit is a good problem to have because when it comes you
|
||||
will have plenty of time to deal with it and the human resources you
|
||||
need to solve it well.
|
||||
|
||||
Early in the life of a small business you donʼt, and every hour you
|
||||
spend trying to work beyond this scaling limit is an hour that would
|
||||
have been better spent talking to your customers about their needs.
|
||||
|
||||
The principle at work here is:
|
||||
|
||||
Donʼt use N computers when 1 will do.
|
||||
|
||||
To go into a bit more technical detail,
|
||||
|
||||
I run a single VM on AWS, in a single availability zone. The VM has
|
||||
three EBS volumes (this is Amazon name for NAS). The first holds the
|
||||
OS, logs, temporary files, and any ephemeral SQLite databases that are
|
||||
generated from the main databases, e.g. FTS tables. The second the
|
||||
primary SQLite database for the main service. The third holds the
|
||||
customer sync SQLite databases.
|
||||
|
||||
The system is configured to periodically snapshot the system EBS volume
|
||||
and the customer EBS volumes to S3, the Amazon geo-redundant blob
|
||||
store. This is a relatively cheap operation that can be scripted,
|
||||
because only blocks that change are copied.
|
||||
|
||||
The main EBS volume is backed up to S3 very regularly, by custom code
|
||||
that flushes the WAL cache. Iʼll explain that in a bit.
|
||||
|
||||
The service is a single Go binary running on this VM. The machine has
|
||||
plenty of extra RAM that is used by linuxʼs disk cache. (And that can
|
||||
be used by a second copy of the service spinning up for low down-time
|
||||
replacement.)
|
||||
|
||||
The result of this is a service that has at most tens of hours of
|
||||
downtime a year, about as much change of suffering block loss as a
|
||||
physical computer with a RAID5 array, and active offsite backups being
|
||||
made every few minutes to a distributed system that is built and
|
||||
maintained by a large team.
|
||||
|
||||
This system is astonishingly simple. I shell into one machine. It is a
|
||||
linux machine. I have a deploy script for the service that is ten lines
|
||||
long. Almost all of my performance work is done with pprof.
|
||||
|
||||
On a medium sized VM I can clock 5-6 thousand concurrent requests with
|
||||
only a few hours of performance tuning. On the largest machine AWS has,
|
||||
tens of thousands.
|
||||
|
||||
Now to talk a little more about the particulars of the stack:
|
||||
|
||||
Shared cache and WAL
|
||||
|
||||
To make the server extremely concurrent there are two important SQLite
|
||||
features I use. The first is the shared cache, which lets me allocate
|
||||
one large pool of memory to the database page cache and many concurrent
|
||||
connections can use it simultaneously. This requires some support in
|
||||
the driver for sqlite_unlock_notify so user code doesnʼt need to deal
|
||||
with locking events, but that is transparent to end user code.
|
||||
|
||||
The second is the Write Ahead Log. This is a mode SQLite can be knocked
|
||||
into at the beginning of connection which changes the way it writes
|
||||
transactions to disk. Instead of locking the database and making
|
||||
modifications along with a rollback journal, it appends the new change
|
||||
to a separate file. This allows readers to work concurrently with the
|
||||
writer. The WAL has to be flushed periodically by SQLite, which
|
||||
involves locking the database and writing the changes from it. There
|
||||
are default settings for doing this.
|
||||
|
||||
I override these and execute WAL flushes manually from a package that,
|
||||
when it is done, also triggers an S3 snapshot. This package is called
|
||||
reallyfsync, and if I can work out how to test it properly I will make
|
||||
it open source.
|
||||
|
||||
Incremental Blob API
|
||||
|
||||
Another smaller, but important to my particular server feature, is
|
||||
SQLiteʼs [10]incremental blob API. This allows a field of bytes to be
|
||||
read and written in the DB without storing all the bytes in memory
|
||||
simultaneously, which matters when it is possible for each request to
|
||||
be working with hundreds of megabytes, but you want tens of thousands
|
||||
of potential concurrent requests.
|
||||
|
||||
This is one of the places where the driver deviates from being a
|
||||
close-to-cgo wrapper to be more [11]Go-like:
|
||||
type Blob
|
||||
func (blob *Blob) Close() error
|
||||
func (blob *Blob) Read(p []byte) (n int, err error)
|
||||
func (blob *Blob) ReadAt(p []byte, off int64) (n int, err error)
|
||||
func (blob *Blob) Seek(offset int64, whence int) (int64, error)
|
||||
func (blob *Blob) Size() int64
|
||||
func (blob *Blob) Write(p []byte) (n int, err error)
|
||||
func (blob *Blob) WriteAt(p []byte, off int64) (n int, err error)
|
||||
|
||||
This looks a lot like a file, and indeed can be used like a file, with
|
||||
one caveat: the size of a blob is set when it is created. (As such, I
|
||||
still find temporary files to be useful.)
|
||||
|
||||
Designing with one process programming
|
||||
|
||||
I start with: Do you really need N computers?
|
||||
|
||||
Some problems really do. For example, you cannot build a low-latency
|
||||
index of the public internet with only 4TB of RAM. You need a lot more.
|
||||
These problems are great fun, and we like to talk a lot about them, but
|
||||
they are a relatively small amount of all the code written. So far all
|
||||
the projects I have been developing post-Google fit on 1 computer.
|
||||
|
||||
There are also more common sub-problems that are hard to solve with one
|
||||
computer. If you have a global customer base and need low-latency to
|
||||
your server, the speed of light gets in the way. But many of these
|
||||
problems can be solved with relatively straightforward CDN products.
|
||||
|
||||
Another great solution to the speed of light is geo-sharding. Have
|
||||
complete and independent copies of your service in multiple
|
||||
datacenters, move your userʼs data to the service near them. This can
|
||||
be as easy as having one small global redirect database (maybe SQLite
|
||||
on geo-redundant NFS!) redirecting the user to a specific DNS name like
|
||||
{us-east, us-west}.mservice.com.
|
||||
|
||||
Most problems do fit in one computer, up to a point. Spend some time
|
||||
determining where that point is. If it is years away there is a good
|
||||
chance one computer will do.
|
||||
|
||||
Indie dev techniques for the corporate programmer
|
||||
|
||||
Even if you do not write code in this particular technology stack and
|
||||
you are not an independent developer, there is value here. Use the one
|
||||
big VM, one zone, one process Go, SQLite, and snapshot backup stack as
|
||||
a hypothetical tool to test your designs.
|
||||
|
||||
So add a hypothetical step to your design process: If you solved your
|
||||
problem on this stack with one computers, how far could you get? How
|
||||
many customers could you support? At what size would you need to
|
||||
rewrite your software?
|
||||
|
||||
If this indie mini stack would last your business years, you might want
|
||||
to consider delaying the adoption of modern cloud software.
|
||||
|
||||
If you are a programmer at a well-capitalized company, you may also
|
||||
want to consider what development looks like for small internal or
|
||||
experimental projects. Do your coworkers have to use large complex
|
||||
distributed systems for policy reasons? Many of these projects will
|
||||
never need to scale beyond one computer, or if they do they will need a
|
||||
rewrite to deal with shifting requirements. In which case, find a way
|
||||
to make an indie stack, linux VMs with a file system, available for
|
||||
prototyping and experimentation.
|
||||
__________________________________________________________________
|
||||
|
||||
[12]Index
|
||||
[13]github.com/crawshaw
|
||||
[14]twitter.com/davidcrawshaw
|
||||
david@zentus.com
|
||||
|
||||
References
|
||||
|
||||
1. file:///atom.xml
|
||||
2. https://gonorthwest.io/
|
||||
3. https://www.kaggle.com/kingburrito666/shakespeare-plays
|
||||
4. https://golang.org/pkg/database/sql
|
||||
5. https://github.com/mattn/go-sqlite3
|
||||
6. https://crawshaw.io/sqlite
|
||||
7. https://www.sqlite.org/sessionintro.html
|
||||
8. https://www.sqlite.org/sharedcache.html
|
||||
9. https://www.posticulous.com/
|
||||
10. https://www.sqlite.org/c3ref/blob_open.html
|
||||
11. https://godoc.org/crawshaw.io/sqlite#Blob
|
||||
12. file:///
|
||||
13. https://github.com/crawshaw
|
||||
14. https://twitter.com/davidcrawshaw
|
||||
Reference in New Issue
Block a user