Add go notes

This commit is contained in:
David Eisinger
2023-06-13 10:52:21 -04:00
parent 9fd4072715
commit e1228a6e9a
3 changed files with 950 additions and 0 deletions

View File

@@ -0,0 +1,365 @@
#[1]home
[2]Jump to Content
Cloud
(BUTTON)
[3]Blog
[4]Contact sales [5]Get started for free
____________________
Cloud
[6]Blog
* Solutions & technology
+ [7]AI & Machine Learning
+ [8]API Management
+ [9]Application Development
+ [10]Application Modernization
+ [11]Chrome Enterprise
+ [12]Compute
+ [13]Containers & Kubernetes
+ [14]Data Analytics
+ [15]Databases
+ [16]DevOps & SRE
+ [17]Maps & Geospatial
+ [18]Security & Identity
+ [19]Infrastructure
+ [20]Infrastructure Modernization
+ [21]Networking
+ [22]Productivity & Collaboration
+ [23]SAP on Google Cloud
+ [24]Storage & Data Transfer
+ [25]Sustainability
* Ecosystem
+ [26]IT Leaders
+ Industries
o [27]Financial Services
o [28]Healthcare & Life Sciences
o [29]Manufacturing
o [30]Media & Entertainment
o [31]Public Sector
o [32]Retail
o [33]Supply Chain
o [34]Telecommunications
+ [35]Partners
+ [36]Startups & SMB
+ [37]Training & Certifications
+ [38]Inside Google Cloud
+ [39]Google Cloud Next & Events
+ [40]Google Maps Platform
+ [41]Google Workspace
* [42]Developers & Practitioners
* [43]Transform with Google Cloud
[44]Contact sales [45]Get started for free
____________________
Application Modernization
Why I love Go
September 12, 2022
*
*
*
*
David Yach
Director of Engineering at Google Cloud
Iâve been building software over the last four decades, as a developer,
manager and executive in both small and large software companies. I
started my career working on commercial compilers, first BASIC and then
C. I have written a lot of code in many different languages, and
managed teams with even broader language usage.
I learned Go about 5 years ago when I was CTO at a startup/scaleup. At
the time, we were looking to move to a microservice architecture, and
that shift gave us the opportunity to consider moving away from the
incumbent language (Scala). As I read through the Go tutorials, my
compiler-writing background came back to me and I found myself
repeatedly thinking âThatâs cool â I know why the Go team did that!â So
I got hooked on the language design.
Learning
I have worked with many different computer languages over the years, so
I was not surprised I could quickly get started writing Go programs
after reading through the online documents and tutorials. But then when
I saw a new co-op student (a.k.a. intern) learn Go and write a
substantial prototype in their first two weeks on the job, it became
clear that Go was much easier to learn than many other languages.
Writing code
As I started writing my first Go programs, the first thing that struck
me was the blazing compiler speed. It was as fast or faster starting my
application than many interpreted languages, yet it was a compiled
program with a strongly typed language. (I have an affinity for
strongly typed languages â I have spent way too much time tracking down
obscure issues in my own code in dynamic typed languages, where the
same issue would have been a compile error in a strongly typed
language.) Even better, in Go I often donât need to declare the type â
the compiler figures it out.
I was impressed with the standard Go library â it included many of the
capabilities required by modern applications â things like HTTP
support, JSON handling and encryption. Many other languages required
you to use a third-party library for these features, and often there
were multiple competing libraries to choose from, adding another
decision point for the developer. With Go, I could go to the standard
library GoDoc and get started right away.
There were a few other language decisions that I found helpful. One is
that the compiler figures out if you are returning a pointer to a
local, and behind the scenes allocates the memory rather than using the
stack. This prevents bugs, and I find the code more readable.Â
I also like that you donât declare that you support an interface. I
wasnât sure I would like this at first because it isnât obvious if a
type implements a particular interface, but I found greater value in
the fact that I wasnât dependent on the code author (even if it was
me!) to declare that the interface is implemented. This first hit home
when I used fmt.Println() and it automatically used the String() method
I had implemented even though it hadnât occurred to me that I was
implementing the Stringer interface.
The last feature Iâll note is the ability to do concurrent programming
through channels and goroutines. The model is simple to understand yet
powerful.
Reading code
After writing more Go code and starting to incorporate third party
libraries, I had a realization that had never occurred to me before â
as a developer, I spend a lot of time reading code. In fact, I probably
spend more time reading code than writing it, once you start counting
code reviews, debugging, and evaluating third-party libraries.
What was different about reading Go code? I would summarize it by âit
all looks the same.â What do I mean by that? Go format ensures all the
braces are in the same spot; capitalized identifiers are exported;
there are no implicit conversions, even of internal types; and there is
no overloading of operators, functions or methods. That means that with
Go code, âwhat you see is what you getâ with no hidden meaning. Of
course, it doesnât help me to understand a complicated algorithm, but
it does mean that I can concentrate more on that algorithm because I
donât have to understand whether â+â is overloaded, for example.
I was also pleasantly surprised when I used GoDoc on one of my
projects, and discovered that I had semi-reasonable documentation
without doing anything while writing the code other than adding
comments on my functions and methods based on nagging from the IDE I
was using. I did spend some time cleaning up the comments after that,
but Iâm not sure I would have even started that work if Go hadnât given
me a great starting point.
Testing code
Go test is part of the standard Go tools and supported by IDEs, making
it easy to get started creating unit tests for my code. And like the
standard Go library, having a standard way to do tests means I donât
have to evaluate external testing frameworks and select one. I can also
understand the tests when Iâm evaluating a third party library.
Even better, the default behavior running package tests in VSCode is to
enable Goâs built-in code coverage. I had never taken code coverage
seriously working in other languages, partly because it was often
difficult to set up. But the immediate feedback (helped by the blazing
compile speed) gamified this for me, and I found myself adding tests to
increase code coverage (and finding new bugs along the way).
Go doesnât allow circular dependencies between packages. While this has
caused me some rethinking while writing code, I find it makes my
testing regimen easier to think about â if I depend on a package, I can
rely on that package to have its own tests covering its capabilities.
Deploying code
I learned Go at the same time we were migrating towards container-based
microservices. In that environment, the fact that Go produces a single,
self-contained executable makes it much easier and more efficient to
build and manage containers. I can build a container layer with one
single file, which is often a single-digit number of MB in size,
compared to our prior JVM-based containers which started with hundreds
of MB for the Java runtime then another layer for our application. (It
is easy to forget how much this overhead ends up costing in production,
particularly if you have hundreds or thousands of containers running).
Second, Go has built-in cross compiling capabilities so our development
machines, containers and cloud hardware donât all have to all be on the
same processor or operating system. For example, I can use a Linux
build machine to produce client executables for Linux, Mac and Windows.
Again, this takes away a complicated decision process due to artificial
constraints.
Finally, Go has established a well defined set of principles for
versioning and compatibility. While not all pieces of this are
enforced, having the principles from an authoritative source helps
manage the real life challenges of keeping your software supply chain
up to date. For example, it is strongly recommended that breaking
changes require a new major version number. While not enforced, it
leads the community to call out any open source package that violates
this principle.
What do I miss?
I did miss generics; thankfully Go 1.18 added support. And I do wish
the standard library offered immutable collections (like Scala and
other functional languages). Embedding instead of inheritance works
pretty much the same in many cases, but requires some deep thinking
sometimes.
My most frequent coding mistake is when I should have used a pointer
receiver for a method and didnât, then modify the receiver expecting
the changes to be visible when the method returns. The code looks
correct, the right values get assigned if I use a debugger to step
through or issue prints, but the changes disappear after the method
returns. I think I would have preferred if receivers were immutable, it
would have caught these errors at compile time, and in the few
remaining cases where I wanted to modify the receiver I would have
copied it to a local variable.
In conclusion
As you can tell, I am a huge fan of Go, from even before I joined
Google. I am impressed by the language and ecosystem design, and by the
implementation. For me, Go makes me a more productive developer and Iâm
more confident in the quality of the code I produce.
Go, give it a [46]try!
Posted in
* [47]Application Modernization
* [48]Application Development
* [49]Open Source
Related articles
https://storage.googleapis.com/gweb-cloudblog-publish/images/TMI_Blog_h
eader_2436x1200_Rnd2.max-700x700.jpg
Application Development
The Modernization Imperative: Shifting left is for suckers. Shift down
instead
By Richard Seroter ⢠5-minute read
https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE
_ps1BuN1.max-700x700.jpg
DevOps & SRE
Config Connector: An easy way to manage your infrastructure in Google Cloud
By Leonid Yankulin ⢠4-minute read
https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE
_Wfx45fA.max-700x700.jpg
Application Modernization
Realizing cloud value for a render platform at Wayfair â Part 2
By Jack Brooks ⢠4-minute read
https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE
_Wfx45fA.max-700x700.jpg
Application Modernization
Realizing cloud value for a render platform at Wayfair - Part 1
By Jack Brooks ⢠4-minute read
Footer Links
Follow us
*
*
*
*
*
* [50]Google Cloud
* [51]Google Cloud Products
* [52]Privacy
* [53]Terms
* [54]Help
* [âªEnglishâ¬_____....]
References
Visible links:
1. https://cloud.google.com/?lfhs=2
2. https://cloud.google.com/blog/#content
3. https://cloud.google.com/blog
4. https://cloud.google.com/contact/
5. https://console.cloud.google.com/freetrial/
6. https://cloud.google.com/blog
7. https://cloud.google.com/blog/products/ai-machine-learning
8. https://cloud.google.com/blog/products/api-management
9. https://cloud.google.com/blog/products/application-development
10. https://cloud.google.com/blog/products/application-modernization
11. https://cloud.google.com/blog/products/chrome-enterprise
12. https://cloud.google.com/blog/products/compute
13. https://cloud.google.com/blog/products/containers-kubernetes
14. https://cloud.google.com/blog/products/data-analytics
15. https://cloud.google.com/blog/products/databases
16. https://cloud.google.com/blog/products/devops-sre
17. https://cloud.google.com/blog/topics/maps-geospatial
18. https://cloud.google.com/blog/products/identity-security
19. https://cloud.google.com/blog/products/infrastructure
20. https://cloud.google.com/blog/products/infrastructure-modernization
21. https://cloud.google.com/blog/products/networking
22. https://cloud.google.com/blog/products/productivity-collaboration
23. https://cloud.google.com/blog/products/sap-google-cloud
24. https://cloud.google.com/blog/products/storage-data-transfer
25. https://cloud.google.com/blog/topics/sustainability
26. https://cloud.google.com/transform
27. https://cloud.google.com/blog/topics/financial-services
28. https://cloud.google.com/blog/topics/healthcare-life-sciences
29. https://cloud.google.com/blog/topics/manufacturing
30. https://cloud.google.com/blog/products/media-entertainment
31. https://cloud.google.com/blog/topics/public-sector
32. https://cloud.google.com/blog/topics/retail
33. https://cloud.google.com/blog/topics/supply-chain-logistics
34. https://cloud.google.com/blog/topics/telecommunications
35. https://cloud.google.com/blog/topics/partners
36. https://cloud.google.com/blog/topics/startups
37. https://cloud.google.com/blog/topics/training-certifications
38. https://cloud.google.com/blog/topics/inside-google-cloud
39. https://cloud.google.com/blog/topics/google-cloud-next
40. https://cloud.google.com/blog/products/maps-platform
41. https://workspace.google.com/blog
42. https://cloud.google.com/blog/topics/developers-practitioners
43. https://cloud.google.com/transform
44. https://cloud.google.com/contact/
45. https://console.cloud.google.com/freetrial/
46. https://go.dev/tour/list
47. https://cloud.google.com/blog/products/application-modernization
48. https://cloud.google.com/blog/products/application-development
49. https://cloud.google.com/blog/products/open-source
50. https://cloud.google.com/
51. https://cloud.google.com/products/
52. https://myaccount.google.com/privacypolicy?hl=en-US
53. https://myaccount.google.com/termsofservice?hl=en-US
54. https://support.google.com/
Hidden links:
56. https://cloud.google.com/
57. https://cloud.google.com/
58. https://twitter.com/intent/tweet?text=Why%20I%20love%20Go%20@googlecloud&url=https://cloud.google.com/blog/products/application-modernization/why-david-yach-loves-go
59. https://www.linkedin.com/shareArticle?mini=true&url=https://cloud.google.com/blog/products/application-modernization/why-david-yach-loves-go&title=Why%20I%20love%20Go
60. https://www.facebook.com/sharer/sharer.php?caption=Why%20I%20love%20Go&u=https://cloud.google.com/blog/products/application-modernization/why-david-yach-loves-go
61. mailto:?subject=Why%20I%20love%20Go&body=Check%20out%20this%20article%20on%20the%20Cloud%20Blog:%0A%0AWhy%20I%20love%20Go%0A%0ALearn%20all%20the%20reasons%20David%20Yach,%20industry%20veteran%20and%20Director%20of%20Engineering%20at%20Google%20Cloud,%20loves%20to%20use%20Go%20for%20software%20development.%0A%0Ahttps://cloud.google.com/blog/products/application-modernization/why-david-yach-loves-go
62. https://cloud.google.com/blog/products/application-development/richard-seroter-on-shifting-down-vs-shifting-left
63. https://cloud.google.com/blog/products/devops-sre/how-config-connector-compares-for-infrastructure-management
64. https://cloud.google.com/blog/products/application-modernization/wayfair-identified-strategies-to-optimize-cloud-workloads-part-2
65. https://cloud.google.com/blog/products/application-modernization/wayfair-identified-strategies-to-optimize-cloud-workloads-part-1
66. https://www.twitter.com/googlecloud
67. https://www.youtube.com/googlecloud
68. https://www.linkedin.com/showcase/google-cloud
69. https://www.instagram.com/googlecloud/
70. https://www.facebook.com/googlecloud/
71. https://cloud.google.com/

View File

@@ -0,0 +1,574 @@
#[1]crawshaw.io atom feed
One process programming notes (with Go and SQLite)
2018 July 30
Blog-ified version of a talk I gave at [2]Go Northwest.
This content covers my recent exploration of writing internet services,
iOS apps, and macOS programs as an indie developer.
There are several topics here that should each have their own blog
post. But as I have a lot of programming to do I am going to put these
notes up as is and split the material out some time later.
My focus has been on how to adapt the lessons I have learned working in
teams at Google to a single programmer building small business work.
There are many great engineering practices in Silicon Valleyʼs big
companies and well-capitalized VC firms, but one person does not have
enough bandwidth to use them all and write software. The exercise for
me is: what to keep and what must go.
If I have been doing it right, the technology and techniques described
here will sound easy. I have to fit it all in my head while having
enough capacity left over to write software people want. Every extra
thing has great cost, especially rarely touched software that comes
back to bite in the middle of the night six months later.
Two key technologies I have decided to use are Go and SQLite.
A brief introduction to SQLite
SQLite is an implementation of SQL. Unlike traditional database
implementations like PostgreSQL or MySQL, SQLite is a self-contained C
library designed to be embedded into programs. It has been built by D.
Richard Hipp since its release in 2000, and in the past 18 years other
open source contributors have helped. At this point it has been around
most of the time I have been programming and is a core part of my
programming toolbox.
Hands-on with the SQLite command line tool
Rather than talk through SQLite in the abstract, let me show it to you.
A kind person on Kaggle has [3]provided a CSV file of the plays of
Shakespeare. Letʼs build an SQLite database out of it.
$ head shakespeare_data.csv
"Dataline","Play","PlayerLinenumber","ActSceneLine","Player","PlayerLine"
"1","Henry IV",,,,"ACT I"
"2","Henry IV",,,,"SCENE I. London. The palace."
"3","Henry IV",,,,"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMOR
ELAND, SIR WALTER BLUNT, and others"
"4","Henry IV","1","1.1.1","KING HENRY IV","So shaken as we are, so wan with car
e,"
"5","Henry IV","1","1.1.2","KING HENRY IV","Find we a time for frighted peace to
pant,"
"6","Henry IV","1","1.1.3","KING HENRY IV","And breathe short-winded accents of
new broils"
"7","Henry IV","1","1.1.4","KING HENRY IV","To be commenced in strands afar remo
te."
"8","Henry IV","1","1.1.5","KING HENRY IV","No more the thirsty entrance of this
soil"
"9","Henry IV","1","1.1.6","KING HENRY IV","Shall daub her lips with her own chi
ldren's blood,"
First, letʼs use the sqlite command line tool to create a new database
and import the CSV.
$ sqlite3 shakespeare.db
sqlite> .mode csv
sqlite> .import shakespeare_data.csv import
Done! A couple of SELECTs will let us quickly see if it worked.
sqlite> SELECT count(*) FROM import;
111396
sqlite> SELECT * FROM import LIMIT 10;
1,"Henry IV","","","","ACT I"
2,"Henry IV","","","","SCENE I. London. The palace."
3,"Henry IV","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WES
TMORELAND, SIR WALTER BLUNT, and others"
4,"Henry IV",1,1.1.1,"KING HENRY IV","So shaken as we are, so wan with care,"
5,"Henry IV",1,1.1.2,"KING HENRY IV","Find we a time for frighted peace to pant,
"
6,"Henry IV",1,1.1.3,"KING HENRY IV","And breathe short-winded accents of new br
oils"
7,"Henry IV",1,1.1.4,"KING HENRY IV","To be commenced in strands afar remote."
8,"Henry IV",1,1.1.5,"KING HENRY IV","No more the thirsty entrance of this soil"
9,"Henry IV",1,1.1.6,"KING HENRY IV","Shall daub her lips with her own children'
s blood,"
Looks good! Now we can do a little cleanup. The original CSV contains a
column called AceSceneLine that uses dots to encode Act number, Scene
number, and Line number. Those would look much nicer as their own
columns.
sqlite> CREATE TABLE plays (rowid INTEGER PRIMARY KEY, play, linenumber, act, sc
ene, line, player, text);
sqlite> .schema
CREATE TABLE import (rowid primary key, play, playerlinenumber, actsceneline, pl
ayer, playerline);
CREATE TABLE plays (rowid primary key, play, linenumber, act, scene, line, playe
r, text);
sqlite> INSERT INTO plays SELECT
row AS rowid,
play,
playerlinenumber AS linenumber,
substr(actsceneline, 1, 1) AS act,
substr(actsceneline, 3, 1) AS scene,
substr(actsceneline, 5, 5) AS line,
player,
playerline AS text
FROM import;
(The substr above can be improved by using instr to find the ʼ.ʼ
characters. Exercise left for the reader.)
Here we used the INSERT ... SELECT syntax to build a table out of
another table. The ActSceneLine column was split apart using the
builtin SQLite function substr, which slices strings.
The result:
sqlite> SELECT * FROM plays LIMIT 10;
1,"Henry IV","","","","","","ACT I"
2,"Henry IV","","","","","","SCENE I. London. The palace."
3,"Henry IV","","","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL
of WESTMORELAND, SIR WALTER BLUNT, and others"
4,"Henry IV",1,1,1,1,"KING HENRY IV","So shaken as we are, so wan with care,"
5,"Henry IV",1,1,1,2,"KING HENRY IV","Find we a time for frighted peace to pant,
"
6,"Henry IV",1,1,1,3,"KING HENRY IV","And breathe short-winded accents of new br
oils"
7,"Henry IV",1,1,1,4,"KING HENRY IV","To be commenced in strands afar remote."
8,"Henry IV",1,1,1,5,"KING HENRY IV","No more the thirsty entrance of this soil"
9,"Henry IV",1,1,1,6,"KING HENRY IV","Shall daub her lips with her own children'
s blood,"
Now we have our data, let us search for something:
sqlite> SELECT * FROM plays WHERE text LIKE "whether tis nobler%";
sqlite>
That did not work. Hamlet definitely says that, but perhaps the text
formatting is slightly off. SQLite to the rescue. It ships with a Full
Text Search extension compiled in. Let us index all of Shakespeare with
FTS5:
sqlite> CREATE VIRTUAL TABLE playsearch USING fts5(playsrowid, text);
sqlite> INSERT INTO playsearch SELECT rowid, text FROM plays;
Now we can search for our soliloquy:
sqlite> SELECT rowid, text FROM playsearch WHERE text MATCH "whether tis nobler"
;
34232|Whether 'tis nobler in the mind to suffer
Success! The act and scene can be acquired by joining with our original
table.
sqlite> SELECT play, act, scene, line, player, plays.text
FROM playsearch
INNER JOIN plays ON playsearch.playsrowid = plays.rowid
WHERE playsearch.text MATCH "whether tis nobler";
Hamlet|3|1|65|HAMLET|Whether 'tis nobler in the mind to suffer
Letʼs clean up.
sqlite> DROP TABLE import;
sqlite> VACUUM;
Finally, what does all of this look like on the file system?
$ ls -l
-rwxr-xr-x@ 1 crawshaw staff 10188854 Apr 27 2017 shakespeare_data.csv
-rw-r--r-- 1 crawshaw staff 22286336 Jul 25 22:05 shakespeare.db
There you have it. The SQLite database contains two full copies of the
plays of Shakespeare, one with a full text search index, and stores
both of them in about twice the space it takes the original CSV file to
store one. Not bad.
That should give you a feel for the i-t-e of SQLite.
And scene.
Using SQLite from Go
The standard database/sql
There are a number of cgo-based [4]database/sql drivers available for
SQLite. The most popular one appears to be
[5]github.com/mattn/go-sqlite3. It gets the job done and is probably
what you want.
Using the database/sql package it is straightforward to open an SQLite
database and execute SQL statements on it. For example, we can run the
FTS query from earlier using this Go code:
package main
import (
"database/sql"
"fmt"
"log"
_ "github.com/mattn/go-sqlite3"
)
func main() {
db, err := sql.Open("sqlite3", "shakespeare.db")
if err != nil {
log.Fatal(err)
}
defer db.Close()
stmt, err := db.Prepare(`
SELECT play, act, scene, plays.text
FROM playsearch
INNER JOIN plays ON playsearch.playrowid = plays.rowid
WHERE playsearch.text MATCH ?;`)
if err != nil {
log.Fatal(err)
}
var play, text string
var act, scene int
err = stmt.QueryRow("whether tis nobler").Scan(&play, &act, &scene, &tex
t)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s %d:%d: %q\n", play, act, scene, text)
}
Executing it yields:
Hamlet 3:1 "Whether 'tis nobler in the mind to suffer"
A low-level wrapper: crawshaw.io/sqlite
Just as SQLite steps beyond the basics of SELECT, INSERT, UPDATE,
DELETE with full-text search, it has several other interesting features
and extensions that cannot be accessed by SQL statements alone. These
need specialized interfaces, and many of the interfaces are not
supported by any of the existing drivers.
So I wrote my own. You can get it from [6]crawshaw.io/sqlite. In
particular, it supports the streaming blob interface, the [7]session
extension, and implements the necessary sqlite_unlock_notify machinery
to make good use of the [8]shared cache for connection pools. I am
going to cover these features through two use case studies: the client
and the cloud.
cgo
All of these approaches rely on cgo for integrating C into Go. This is
straightforward to do, but adds some operational complexity. Building a
Go program using SQLite requires a C compiler for the target.
In practice, this means if you develop on macOS you need to install a
cross-compiler for linux.
Typical concerns about the impact on software quality of adding C code
to Go do not apply to SQLite as it has an extraordinary degree of
testing. The quality of the code is exceptional.
Go and SQLite for the client
I am building an [9]iOS app, with almost all the code written in Go and
the UI provided by a web view. This app has a full copy of the user
data, it is not a thin view onto an internet server. This means storing
a large amount of local, structured data, on-device full text
searching, background tasks working on the database in a way that does
not disrupt the UI, and syncing DB changes to a backup in the cloud.
That is a lot of moving parts for a client. More than I want to write
in JavaScript, and more than I want to write in Swift and then have to
promptly rewrite if I ever manage to build an Android app. More
importantly, the server is in Go, and I am one independent developer.
It is absolutely vital I reduce the number of moving pieces in my
development environment to the smallest possible number. Hence the
effort to build (the big bits) of a client using the exact same
technology as my server.
The Session extension
The session extension lets you start a session on an SQLite connection.
All changes made to the database through that connection are bundled
into a patchset blob. The extension also provides method for applying
the generated patchset to a table.
func (conn *Conn) CreateSession(db string) (*Session, error)
func (s *Session) Changeset(w io.Writer) error
func (conn *Conn) ChangesetApply(
r io.Reader,
filterFn func(tableName string) bool,
conflictFn func(ConflictType, ChangesetIter) ConflictAction,
) error
This can be used to build a very simple client-sync system. Collect the
changes made in a client, periodically bundle them up into a changeset
and upload it to the server where it is applied to a backup copy of the
database. If another client changes the database then the server
advertises it to the client, who downloads a changeset and applies it.
This requires a bit of care in the database design. The reason I kept
the FTS table separate in the Shakespeare example is I keep my FTS
tables in a separate attached database (which in SQLite, means a
different file). The cloud backup database never generates the FTS
tables, the client is free to generate the tables in a background
thread and they can lag behind data backups.
Another point of care is minimizing conflicts. The biggest one is
AUTOINCREMENT keys. By default the primary key of a rowid table is
incremented, which means if you have multiple clients generating rowids
you will see lots of conflicts.
I have been trialing two different solutions. The first is having each
client register a rowid range with the server and only allocate from
its own range. It works. The second is randomly generating int64
values, and relying on the low collision rate. So far it works too.
Both strategies have risks, and I havenʼt decided which is better.
In practice, I have found I have to limit DB updates to a single
connection to keep changeset quality high. (A changeset does not see
changes made on other connections.) To do this I maintain a read-only
pool of connections and a single guarded read-write connection in a
pool of 1. The code only grabs the read-write connection when it needs
it, and the read-only connections are enforced by the read-only bit on
the SQLite connection.
Nested Transactions
The database/sql driver encourages the use of SQL transactions with its
Tx type, but this does not appear to play well with nested
transactions. This is a concept implemented by SAVEPOINT / RELEASE in
SQL, and it makes for surprisingly composable code.
If a function needs to make multiple statements in a transaction, it
can open with a SAVEPOINT, then defer a call to RELEASE if the function
produces no Go return error, or if it does instead call ROLLBACK and
return the error.
func f(conn *sqlite.Conn) (err error) {
conn...SAVEPOINT
defer func() {
if err == nil {
conn...RELEASE
} else {
conn...ROLLBACK
}
}()
}
Now if this transactional function f needs to call another
transactional function g, then g can use exactly the same strategy and
f can call it in a very traditional Go way:
if err := g(conn); err != nil {
return err // all changes in f will be rolled back by the defer
}
The function g is also perfectly safe to use in its own right, as it
has its own transaction.
I have been using this SAVEPOINT + defer RELEASE or return an error
semantics for several months now and find it invaluable. It makes it
easy to safely wrap code in SQL transactions.
The example above however is a bit bulky, and there are some edge cases
that need to be handled. (For example, if the RELEASE fails, then an
error needs to be returned.) So I have wrapped this up in a utility:
func f(conn *sqlite.Conn) (err error) {
defer sqlitex.Save(conn)(&err)
// Code is transactional and can be stacked
// with other functions that call sqlitex.Save.
}
The first time you see sqlitex.Save in action it can be a little
off-putting, at least it was for me when I first created it. But I
quickly got used to it, and it does a lot of heavy lifting. The first
call to sqlitex.Save opens a SAVEPOINT on the conn and returns a
closure that either RELEASEs or ROLLBACKs depending on the value of
err, and sets err if necessary.
Go and SQLite in the cloud
I have spent several months now redesigning services I have encountered
before and designing services for problems I would like to work on
going forward. The process has led me to a general design that works
for many problems and I quite enjoy building.
It can be summarized as 1 VM, 1 Zone, 1 process programming.
If this sounds ridiculously simplistic to you, I think thatʼs good! It
is simple. It does not meet all sorts of requirements that we would
like our modern fancy cloud services to meet. It is not "serverless",
which means when a service is extremely small it does not run for free,
and when a service grows it does not automatically scale. Indeed, there
is an explicit scaling limit. Right now the best server you can get
from Amazon is roughly:
* 128 CPU threads at ~4GHz
* 4TB RAM
* 25 Gbit ethernet
* 10 Gbps NAS
* hours of yearly downtime
That is a huge potential downside of of one process programming.
However, I claim that is a livable limit.
I claim typical services do not hit this scaling limit.
If you are building a small business, most products can grow and become
profitable well under this limit for years. When you see the limit
approaching in the next year or two, you have a business with revenue
to hire more than one engineer, and the new team can, in the face of
radically changing business requirements, rewrite the service.
Reaching this limit is a good problem to have because when it comes you
will have plenty of time to deal with it and the human resources you
need to solve it well.
Early in the life of a small business you donʼt, and every hour you
spend trying to work beyond this scaling limit is an hour that would
have been better spent talking to your customers about their needs.
The principle at work here is:
Donʼt use N computers when 1 will do.
To go into a bit more technical detail,
I run a single VM on AWS, in a single availability zone. The VM has
three EBS volumes (this is Amazon name for NAS). The first holds the
OS, logs, temporary files, and any ephemeral SQLite databases that are
generated from the main databases, e.g. FTS tables. The second the
primary SQLite database for the main service. The third holds the
customer sync SQLite databases.
The system is configured to periodically snapshot the system EBS volume
and the customer EBS volumes to S3, the Amazon geo-redundant blob
store. This is a relatively cheap operation that can be scripted,
because only blocks that change are copied.
The main EBS volume is backed up to S3 very regularly, by custom code
that flushes the WAL cache. Iʼll explain that in a bit.
The service is a single Go binary running on this VM. The machine has
plenty of extra RAM that is used by linuxʼs disk cache. (And that can
be used by a second copy of the service spinning up for low down-time
replacement.)
The result of this is a service that has at most tens of hours of
downtime a year, about as much change of suffering block loss as a
physical computer with a RAID5 array, and active offsite backups being
made every few minutes to a distributed system that is built and
maintained by a large team.
This system is astonishingly simple. I shell into one machine. It is a
linux machine. I have a deploy script for the service that is ten lines
long. Almost all of my performance work is done with pprof.
On a medium sized VM I can clock 5-6 thousand concurrent requests with
only a few hours of performance tuning. On the largest machine AWS has,
tens of thousands.
Now to talk a little more about the particulars of the stack:
Shared cache and WAL
To make the server extremely concurrent there are two important SQLite
features I use. The first is the shared cache, which lets me allocate
one large pool of memory to the database page cache and many concurrent
connections can use it simultaneously. This requires some support in
the driver for sqlite_unlock_notify so user code doesnʼt need to deal
with locking events, but that is transparent to end user code.
The second is the Write Ahead Log. This is a mode SQLite can be knocked
into at the beginning of connection which changes the way it writes
transactions to disk. Instead of locking the database and making
modifications along with a rollback journal, it appends the new change
to a separate file. This allows readers to work concurrently with the
writer. The WAL has to be flushed periodically by SQLite, which
involves locking the database and writing the changes from it. There
are default settings for doing this.
I override these and execute WAL flushes manually from a package that,
when it is done, also triggers an S3 snapshot. This package is called
reallyfsync, and if I can work out how to test it properly I will make
it open source.
Incremental Blob API
Another smaller, but important to my particular server feature, is
SQLiteʼs [10]incremental blob API. This allows a field of bytes to be
read and written in the DB without storing all the bytes in memory
simultaneously, which matters when it is possible for each request to
be working with hundreds of megabytes, but you want tens of thousands
of potential concurrent requests.
This is one of the places where the driver deviates from being a
close-to-cgo wrapper to be more [11]Go-like:
type Blob
func (blob *Blob) Close() error
func (blob *Blob) Read(p []byte) (n int, err error)
func (blob *Blob) ReadAt(p []byte, off int64) (n int, err error)
func (blob *Blob) Seek(offset int64, whence int) (int64, error)
func (blob *Blob) Size() int64
func (blob *Blob) Write(p []byte) (n int, err error)
func (blob *Blob) WriteAt(p []byte, off int64) (n int, err error)
This looks a lot like a file, and indeed can be used like a file, with
one caveat: the size of a blob is set when it is created. (As such, I
still find temporary files to be useful.)
Designing with one process programming
I start with: Do you really need N computers?
Some problems really do. For example, you cannot build a low-latency
index of the public internet with only 4TB of RAM. You need a lot more.
These problems are great fun, and we like to talk a lot about them, but
they are a relatively small amount of all the code written. So far all
the projects I have been developing post-Google fit on 1 computer.
There are also more common sub-problems that are hard to solve with one
computer. If you have a global customer base and need low-latency to
your server, the speed of light gets in the way. But many of these
problems can be solved with relatively straightforward CDN products.
Another great solution to the speed of light is geo-sharding. Have
complete and independent copies of your service in multiple
datacenters, move your userʼs data to the service near them. This can
be as easy as having one small global redirect database (maybe SQLite
on geo-redundant NFS!) redirecting the user to a specific DNS name like
{us-east, us-west}.mservice.com.
Most problems do fit in one computer, up to a point. Spend some time
determining where that point is. If it is years away there is a good
chance one computer will do.
Indie dev techniques for the corporate programmer
Even if you do not write code in this particular technology stack and
you are not an independent developer, there is value here. Use the one
big VM, one zone, one process Go, SQLite, and snapshot backup stack as
a hypothetical tool to test your designs.
So add a hypothetical step to your design process: If you solved your
problem on this stack with one computers, how far could you get? How
many customers could you support? At what size would you need to
rewrite your software?
If this indie mini stack would last your business years, you might want
to consider delaying the adoption of modern cloud software.
If you are a programmer at a well-capitalized company, you may also
want to consider what development looks like for small internal or
experimental projects. Do your coworkers have to use large complex
distributed systems for policy reasons? Many of these projects will
never need to scale beyond one computer, or if they do they will need a
rewrite to deal with shifting requirements. In which case, find a way
to make an indie stack, linux VMs with a file system, available for
prototyping and experimentation.
__________________________________________________________________
[12]Index
[13]github.com/crawshaw
[14]twitter.com/davidcrawshaw
david@zentus.com
References
1. file:///atom.xml
2. https://gonorthwest.io/
3. https://www.kaggle.com/kingburrito666/shakespeare-plays
4. https://golang.org/pkg/database/sql
5. https://github.com/mattn/go-sqlite3
6. https://crawshaw.io/sqlite
7. https://www.sqlite.org/sessionintro.html
8. https://www.sqlite.org/sharedcache.html
9. https://www.posticulous.com/
10. https://www.sqlite.org/c3ref/blob_open.html
11. https://godoc.org/crawshaw.io/sqlite#Blob
12. file:///
13. https://github.com/crawshaw
14. https://twitter.com/davidcrawshaw