Files
2024-03-06 23:43:40 -05:00

223 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Lets Make a Hash Chain in SQLite"
date: 2021-06-30T00:00:00+00:00
draft: false
canonical_url: https://www.viget.com/articles/lets-make-a-hash-chain-in-sqlite/
references:
- title: "How Blockchains Work | data Blog = Blog { me :: Programmer, posts :: [Opinion] }"
url: https://asthasr.github.io/posts/how-blockchains-work/
date: 2024-03-07T04:43:24Z
file: asthasr-github-io-s0ebht.txt
---
I'm not much of a cryptocurrency enthusiast, but there are some neat
ideas in these protocols that I wanted to explore further. Based on my
absolute layperson's understanding, the "crypto" in
"cryptocurrency" describes three things:
1. Some public key/private key stuff to grant access to funds at an
address;
2. For certain protocols (e.g. Bitcoin), the cryptographic
puzzles[^1] that miners
have to solve in order to add new blocks to the ledger; and
3. The use of hashed signatures to ensure data integrity.
Of those three uses, the first two (asymmetric cryptography and
proof-of-work) aren't that interesting to me, at least from a technical
perspective. The third concept, though --- using cryptography to make
data verifiable and tamper-resistant --- that's pretty cool, and
something I wanted to dig into. I decided to build a little
proof-of-concept using [SQLite](https://www.sqlite.org/index.html), a
"small, fast, self-contained, high-reliability, full-featured, SQL
database engine."
A couple notes before we dive in: these concepts aren't unique to the
blockchain; Wikipedia has good explanations of [cryptographic hash
functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function),
[Merkle trees](https://en.wikipedia.org/wiki/Merkle_tree), and [hash
chains](https://en.wikipedia.org/wiki/Hash_chain) if any of this piques
your curiosity. This stuff is also [at the core of
git](https://initialcommit.com/blog/git-bitcoin-merkle-tree), which is
really pretty neat.
## Onto the code
Implementing a rudimentary hash chain in SQL is pretty simple. Here's
my approach, which uses "bookmarks" as an arbitrary record type.
```sql
PRAGMA foreign_keys = ON;
SELECT load_extension("./sha1");
CREATE TABLE bookmarks (
id INTEGER PRIMARY KEY,
signature TEXT NOT NULL UNIQUE
CHECK(signature = sha1(url || COALESCE(parent, ""))),
parent TEXT,
url TEXT NOT NULL UNIQUE,
FOREIGN KEY(parent) REFERENCES bookmarks(signature)
);
CREATE UNIQUE INDEX parent_unique ON bookmarks (
ifnull(parent, "")
);
```
This code is available on
[GitHub](https://github.com/dce/sqlite-hash-chain) in case you want to
try this out on your own. Let's break it down a little bit.
- First, we enable foreign key constraints, which aren't on by
default
- Then we pull in SQLite's [`sha1`
function](https://www.i-programmer.info/news/84-database/10527-sqlite-317-adds-sha1-extension.html),
which implements a common hashing algorithm
- Then we define our table
- `id` isn't mandatory but makes it easier to grab the last entry
- `signature` is the SHA1 hash of the bookmark URL and parent
entry's signature; it uses a `CHECK` constraint to ensure this
is guaranteed to be true
- `parent` is the `signature` of the previous entry in the chain
(notice that it's allowed to be null)
- `url` is the data we want to ensure is immutable (though as
we'll see later, it's not truly immutable since we can still
do cascading updates)
- We set a foreign key constraint that `parent` refers to another
row's `signature` unless it's null
- Then we create a unique index on `parent` that covers the `NULL`
case, since our very first bookmark won't have a parent, but no
other row should be allowed to have a null parent, and no two rows
should be able to have the same parent
Next, let's insert some data:
```sql
INSERT INTO bookmarks (url, signature) VALUES ("google", sha1("google"));
WITH parent AS (SELECT signature FROM bookmarks ORDER BY id DESC LIMIT 1)
INSERT INTO bookmarks (url, parent, signature) VALUES (
"yahoo", (SELECT signature FROM parent), sha1("yahoo" || (SELECT signature FROM parent))
);
WITH parent AS (SELECT signature FROM bookmarks ORDER BY id DESC LIMIT 1)
INSERT INTO bookmarks (url, parent, signature) VALUES (
"bing", (SELECT signature FROM parent), sha1("bing" || (SELECT signature FROM parent))
);
WITH parent AS (SELECT signature FROM bookmarks ORDER BY id DESC LIMIT 1)
INSERT INTO bookmarks (url, parent, signature) VALUES (
"duckduckgo", (SELECT signature FROM parent), sha1("duckduckgo" || (SELECT signature FROM parent))
);
```
OK! Let's fire up `sqlite3` and then `.read` this file. Here's the
result:
```
sqlite> SELECT * FROM bookmarks;
+----+------------------------------------------+------------------------------------------+------------+
| id | signature | parent | url |
+----+------------------------------------------+------------------------------------------+------------+
| 1 | 759730a97e4373f3a0ee12805db065e3a4a649a5 | | google |
| 2 | 64633167b8e44cb833fbfa349731d8a68e942ebc | 759730a97e4373f3a0ee12805db065e3a4a649a5 | yahoo |
| 3 | ce3df1337879e85bc488d4cae129719cc46cad04 | 64633167b8e44cb833fbfa349731d8a68e942ebc | bing |
| 4 | 675570ac126d492e449ebaede091e2b7dad7d515 | ce3df1337879e85bc488d4cae129719cc46cad04 | duckduckgo |
+----+------------------------------------------+------------------------------------------+------------+
```
This has some cool properties. I can't delete an entry in the chain:
```
sqlite> DELETE FROM bookmarks WHERE id = 3;
Error: FOREIGN KEY constraint failed
```
I can't change a URL:
```
sqlite> UPDATE bookmarks SET url = "altavista" WHERE id = 3;
Error: CHECK constraint failed: signature = sha1(url || parent)
```
I can't re-sign an entry:
```
sqlite> UPDATE bookmarks SET url = "altavista", signature = sha1("altavista" || parent) WHERE id = 3;
Error: FOREIGN KEY constraint failed
```
I **can**, however, update the last entry in the chain:
```
sqlite> UPDATE bookmarks SET url = "altavista", signature = sha1("altavista" || parent) WHERE id = 4;
sqlite> SELECT * FROM bookmarks;
+----+------------------------------------------+------------------------------------------+-----------+
| id | signature | parent | url |
+----+------------------------------------------+------------------------------------------+-----------+
| 1 | 759730a97e4373f3a0ee12805db065e3a4a649a5 | | google |
| 2 | 64633167b8e44cb833fbfa349731d8a68e942ebc | 759730a97e4373f3a0ee12805db065e3a4a649a5 | yahoo |
| 3 | ce3df1337879e85bc488d4cae129719cc46cad04 | 64633167b8e44cb833fbfa349731d8a68e942ebc | bing |
| 4 | b583a025b5a43727978c169fe99f5422039194ea | ce3df1337879e85bc488d4cae129719cc46cad04 | altavista |
+----+------------------------------------------+------------------------------------------+-----------+
```
This is because a row isn't really "locked in" until it's pointed to
by another row. It's worth pointing out that an actual blockchain would
use a [consensus
mechanism](https://www.investopedia.com/terms/c/consensus-mechanism-cryptocurrency.asp)
to prevent any updates like this, but that's way beyond the scope of
what we're doing here.
## Cascading updates
Given that we can change the last row, it's possible to update any row
in the ledger provided you 1) also re-sign all of its children and 2) do
it all in a single pass. Here's how you'd update row 2 to
"askjeeves" with a [`RECURSIVE`
query](https://www.sqlite.org/lang_with.html#recursive_common_table_expressions)
(and sorry I know this is a little hairy):
```sql
WITH RECURSIVE
t1(url, parent, old_signature, signature) AS (
SELECT "askjeeves", parent, signature, sha1("askjeeves" || COALESCE(parent, ""))
FROM bookmarks WHERE id = 2
UNION
SELECT t2.url, t1.signature, t2.signature, sha1(t2.url || t1.signature)
FROM bookmarks AS t2, t1 WHERE t2.parent = t1.old_signature
)
UPDATE bookmarks
SET url = (SELECT url FROM t1 WHERE t1.old_signature = bookmarks.signature),
parent = (SELECT parent FROM t1 WHERE t1.old_signature = bookmarks.signature),
signature = (SELECT signature FROM t1 WHERE t1.old_signature = bookmarks.signature)
WHERE signature IN (SELECT old_signature FROM t1);
```
Here's the result of running this update:
```
+----+------------------------------------------+------------------------------------------+-----------+
| id | signature | parent | url |
+----+------------------------------------------+------------------------------------------+-----------+
| 1 | 759730a97e4373f3a0ee12805db065e3a4a649a5 | | google |
| 2 | de357e976171e528088843dfa35c1097017b1009 | 759730a97e4373f3a0ee12805db065e3a4a649a5 | askjeeves |
| 3 | 1b69dff11f3e8ffeade0f42521f9e1bd1bd78539 | de357e976171e528088843dfa35c1097017b1009 | bing |
| 4 | 924660e4f25e2ac8c38ca25bae201ad3a5b6e545 | 1b69dff11f3e8ffeade0f42521f9e1bd1bd78539 | altavista |
+----+------------------------------------------+------------------------------------------+-----------+
```
As you can see, row 2's `url` is updated, and rows 3 and 4 have updated
signatures and parents. Pretty cool, and pretty much the same thing as
what happens when you change a git commit via `rebase` --- all the
successive commits get new SHAs.
---
I'll be honest that I don't have any immediately practical uses for a
cryptographically-signed database table, but I thought it was cool and
helped me understand these concepts a little bit better. Hopefully it
gets your mental wheels spinning a little bit, too. Thanks for reading!
[^1]: [Here's a pretty good explanation of what mining really is](https://asthasr.github.io/posts/how-blockchains-work/), but, in a nutshell, it's running a hashing algorithm over and over again
with a random salt until a hash is found that begins with a required number of zeroes.