add friends ref
This commit is contained in:
@@ -5,6 +5,11 @@ draft: false
|
||||
canonical_url: https://www.viget.com/articles/friends-undirected-graph-connections-in-rails/
|
||||
featured: true
|
||||
exclude_music: true
|
||||
references:
|
||||
- title: "Storing graphs in the database: SQL meets social network - Inviqa"
|
||||
url: https://inviqa.com/blog/storing-graphs-database-sql-meets-social-network
|
||||
date: 2024-01-03T21:44:24Z
|
||||
file: inviqa-com-kztbkj.txt
|
||||
---
|
||||
|
||||
No, sorry, not THOSE friends. But if you're interested in how to do
|
||||
|
||||
747
static/archive/inviqa-com-kztbkj.txt
Normal file
747
static/archive/inviqa-com-kztbkj.txt
Normal file
@@ -0,0 +1,747 @@
|
||||
#[1]alternate
|
||||
|
||||
IFRAME: [2]https://www.googletagmanager.com/ns.html?id=GTM-NBN52P
|
||||
|
||||
[3]Skip to main content
|
||||
|
||||
Search ______________________________
|
||||
Search
|
||||
[4]Home
|
||||
|
||||
Main navigation (BUTTON) Menu
|
||||
* [5]Who we are
|
||||
+ [6]About Inviqa
|
||||
+ [7]About Havas
|
||||
+ [8]Our Sustainability Journey
|
||||
* [9]What we do
|
||||
+ [10]Digital Strategy Consulting
|
||||
+ [11]Digital Roadmap Development
|
||||
+ [12]Digital Product Design
|
||||
+ [13]User Research
|
||||
+ [14]Usability Testing
|
||||
+ [15]Technical Architecture Consulting & Development
|
||||
+ [16]Digital Platform Implementation
|
||||
+ [17]Experience Optimisation
|
||||
+ [18]All services
|
||||
* [19]Case studies
|
||||
+ [20]B2B case studies
|
||||
+ [21]Fashion & Luxury case studies
|
||||
+ [22]Not-For-Profit case studies
|
||||
+ [23]Retail & DTC case studies
|
||||
+ [24]Sport, Leisure & Entertainment case studies
|
||||
+ [25]Travel & Hotels case studies
|
||||
+ [26]All case studies
|
||||
* [27]Partners
|
||||
+ [28]Akeneo
|
||||
+ [29]BigCommerce
|
||||
+ [30]Drupal
|
||||
+ [31]Magento / Adobe Commerce
|
||||
+ [32]Spryker
|
||||
+ [33]All partners
|
||||
* [34]Careers
|
||||
+ [35]Life at Inviqa
|
||||
+ [36]Current Vacancies
|
||||
* [37]Insights
|
||||
+ [38]DTC Ecommerce Report 2023
|
||||
+ [39]PIM Readiness Framework
|
||||
+ [40]Retail Optimisation Whitepaper
|
||||
+ [41]Blog
|
||||
+ [42]All insights
|
||||
* [43]Contact
|
||||
+ [44]Get in Touch
|
||||
|
||||
* [45]EN
|
||||
* [46]DE
|
||||
|
||||
Storing graphs in the database: SQL meets social network
|
||||
|
||||
By Lorenzo Alberton
|
||||
7 September 2009 [47]Technology engineering
|
||||
|
||||
Graphs are ubiquitous. Social or P2P networks, thesauri, route planning
|
||||
systems, recommendation systems, collaborative filtering, even the
|
||||
World Wide Web itself is ultimately a graph!
|
||||
|
||||
Given their importance, it's surely worth spending some time in
|
||||
studying some algorithms and models to represent and work with them
|
||||
effectively. In this short article, we're going to see how we can store
|
||||
a graph in a DBMS. Given how much attention my talk about storing a
|
||||
tree data structure in the db received, it's probably going to be
|
||||
interesting to many. Unfortunately, the Tree models/techniques do not
|
||||
apply to generic graphs, so let's discover how we can deal with them.
|
||||
|
||||
What's a graph
|
||||
|
||||
A graph is a set of nodes (vertices) interconnected by links (edges).
|
||||
When the edges have no orientation, the graph is called an undirected
|
||||
graph. In contrast, a graph where the edges have a specific orientation
|
||||
from a node to another is called directed:
|
||||
" "
|
||||
|
||||
A graph is called complete when there's an edge between any two
|
||||
nodes, dense when the number of edges is close to the maximal number of
|
||||
edges, and sparse when it has only a few edges:
|
||||
" "
|
||||
|
||||
Representing a graph
|
||||
|
||||
Two main data structures for the representation of graphs are used in
|
||||
practice. The first is called an adjacency list, and is implemented as
|
||||
an array with one linked list for each source node, containing the
|
||||
destination nodes of the edges that leave each node. The second is a
|
||||
two-dimensional boolean adjacency matrix, in which the rows and columns
|
||||
are the source and destination vertices, and entries in the array
|
||||
indicate whether an edge exists between the vertices. Adjacency lists
|
||||
are preferred for sparse graphs; otherwise, an adjacency matrix is a
|
||||
good choice. [1]
|
||||
" " " "
|
||||
|
||||
When dealing with databases, most of the times the adjacency matrix is
|
||||
not a viable option, for two reasons: there is a hard limit in the
|
||||
number of columns that a table can have, and adding or removing a node
|
||||
requires a DDL statement.
|
||||
|
||||
Joe Celko dedicates a short chapter to graphs in his '[48]SQL for
|
||||
Smarties' book, but the topic is treated in a quite hasty way, which is
|
||||
surprising given his usual high standards.
|
||||
|
||||
One of the basic rules of a successful representation is to separate
|
||||
the nodes and the edges, to avoid [49]DKNF problems. Thus, we create
|
||||
two tables:
|
||||
CREATE TABLE nodes (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(10) NOT NULL,
|
||||
feat1 CHAR(1), -- e.g., age
|
||||
feat2 CHAR(1) -- e.g., school attended or company
|
||||
);
|
||||
|
||||
CREATE TABLE edges (
|
||||
a INTEGER NOT NULL REFERENCES nodes(id) ON UPDATE CASCADE ON DELETE CASCADE,
|
||||
b INTEGER NOT NULL REFERENCES nodes(id) ON UPDATE CASCADE ON DELETE CASCADE,
|
||||
PRIMARY KEY (a, b)
|
||||
);
|
||||
|
||||
CREATE INDEX a_idx ON edges (a);
|
||||
CREATE INDEX b_idx ON edges (b);
|
||||
|
||||
The first table (nodes) contains the actual node payload, with all the
|
||||
interesting information we need to store about a node (in the
|
||||
example, feat1 and feat2 represent two node features, like the age of
|
||||
the person, or the location, etc.).
|
||||
|
||||
If we want to represent an undirected graph, we need to add a CHECK
|
||||
constraint on the uniqueness of the pair.
|
||||
|
||||
Since the SQL standard does not allow a subquery in the CHECK
|
||||
constraint, we first create a function and then we use it in the CHECK
|
||||
constraint (this example is for PostgreSQL, but can be easily ported to
|
||||
other DBMS):
|
||||
CREATE FUNCTION check_unique_pair(IN id1 INTEGER, IN id2 INTEGER) RETURNS INTEGE
|
||||
R AS $body$
|
||||
DECLARE retval INTEGER DEFAULT 0;
|
||||
BEGIN
|
||||
SELECT COUNT(*) INTO retval FROM (
|
||||
SELECT * FROM edges WHERE a = id1 AND b = id2
|
||||
UNION ALL
|
||||
SELECT * FROM edges WHERE a = id2 AND b = id1
|
||||
) AS pairs;
|
||||
RETURN retval;
|
||||
END
|
||||
$body$
|
||||
LANGUAGE 'plpgsql';
|
||||
|
||||
ALTER TABLE edges ADD CONSTRAINT unique_pair CHECK (check_unique_pair(a, b) < 1)
|
||||
;
|
||||
|
||||
NB: a UDF in a CHECK constraint might be a bit slow [4]. An alternative
|
||||
is to have a materialized view [5] or force an order in the node pair
|
||||
(i.e. "CHECK (a < b)", and then using a stored procedure to insert the
|
||||
nodes in the correct order).
|
||||
|
||||
If we also want to prevent self-loops (i.e. a node linking to itself),
|
||||
we can add another CHECK constraint:
|
||||
ALTER TABLE edges ADD CONSTRAINT no_self_loop CHECK (a <> b)
|
||||
|
||||
" " " "
|
||||
|
||||
Traversing the graph
|
||||
|
||||
Now that we know how to store the graph, we might want to know which
|
||||
nodes are connected. Listing the directly connected nodes is very
|
||||
simple:
|
||||
SELECT *
|
||||
FROM nodes n
|
||||
LEFT JOIN edges e ON n.id = e.b
|
||||
WHERE e.a = 1; -- retrieve nodes connected to node 1
|
||||
|
||||
or, in the case of undirected edges:
|
||||
SELECT * FROM nodes WHERE id IN (
|
||||
SELECT a FROM edges WHERE b = 1
|
||||
UNION
|
||||
SELECT b FROM edges WHERE a = 1
|
||||
);
|
||||
|
||||
-- or alternatively:
|
||||
|
||||
SELECT * FROM nodes where id IN (
|
||||
SELECT CASE WHEN a = 1 THEN b ELSE a END
|
||||
FROM edges
|
||||
WHERE 1 IN (a, b)
|
||||
);
|
||||
|
||||
Traversing the full graph usually requires more than a query: we can
|
||||
either loop through the connected nodes, one level a time, or we can
|
||||
create a temporary table holding all the possible paths between two
|
||||
nodes.
|
||||
|
||||
We could use Oracle’s CONNECT BY syntax or SQL standard’s Common Table
|
||||
Expressions (CTEs) to recurse through the nodes, but since the graph
|
||||
can contain loops, we’d get errors (unless we’re very careful, as we’ll
|
||||
see in a moment).
|
||||
|
||||
Kendall Willets [2] proposes a way of traversing (BFS) the graph using
|
||||
a temporary table. It is quite robust, since it doesn’t fail on graphs
|
||||
with cycles (and when dealing with trees, he shows there are better
|
||||
algorithms available). His solution is just one of the many available,
|
||||
but quite good.
|
||||
|
||||
The problem with temporary tables holding all the possible paths is it
|
||||
has to be maintained. Depending on how frequently the data is accessed
|
||||
and updated it might still be worth it, but it’s quite expensive. If
|
||||
you do resort to such a solution, these references may be of use [13]
|
||||
[14].
|
||||
|
||||
Before going further in our analysis, we need to introduce a new
|
||||
concept: the transitive closure of a graph.
|
||||
|
||||
Transitive closure
|
||||
|
||||
The transitive closure of a graph G = (V,E) is a graph G* = (V,E*) such
|
||||
that E* contains an edge (u,v) if and only if G contains a path from u
|
||||
to v.
|
||||
|
||||
In other words, the transitive closure of a graph is a graph which
|
||||
contains an edge (u,v) whenever there is a directed path from u to v.
|
||||
" "
|
||||
|
||||
Graph: transitive closure
|
||||
|
||||
As already mentioned, SQL has historically been unable [3] to express
|
||||
recursive functions needed to maintain the transitive closure of a
|
||||
graph without an auxiliary table. There are many solutions to solve
|
||||
this problem with a temporary table (some even elegant [2]), but I
|
||||
still haven't found one to do it dynamically.
|
||||
|
||||
Here's my clumsy attempt at a possible solution using CTEs
|
||||
|
||||
First, this is how we can write the WITH RECURSIVE statement for a
|
||||
Directed (Cyclic) Graph:
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string
|
||||
FROM edges
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges AS e
|
||||
JOIN transitive_closure AS tc
|
||||
ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
)
|
||||
SELECT * FROM transitive_closure
|
||||
ORDER BY a, b, distance;
|
||||
|
||||
Notice the WHERE condition, which stops the recursion in the presence
|
||||
of loops. This is very important to avoid errors.
|
||||
|
||||
Sample output:
|
||||
" "
|
||||
|
||||
This is a slightly modified version of the same query to deal with
|
||||
Undirected graphs (NB: this is probably going to be rather slow if done
|
||||
in real time):
|
||||
-- DROP VIEW edges2;
|
||||
CREATE VIEW edges2 (a, b) AS (
|
||||
SELECT a, b FROM edges
|
||||
UNION ALL
|
||||
SELECT b, a FROM edges
|
||||
);
|
||||
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
)
|
||||
SELECT * FROM transitive_closure
|
||||
ORDER BY a, b, distance;
|
||||
|
||||
Linkedin: Degrees of separation
|
||||
|
||||
One of the fundamental characteristics of networks (or graphs in
|
||||
general) is connectivity. We might want to know how to go from A to B,
|
||||
or how two people are connected, and we also want to know how many
|
||||
"hops" separate two nodes, to have an idea about the distance.
|
||||
|
||||
For instance, social networks like LinkedIN show our connections or
|
||||
search results sorted by degree of separation, and trip planning sites
|
||||
show how many flights you have to take to reach your destination,
|
||||
usually listing direct connections first.
|
||||
|
||||
There are some database extensions or hybrid solutions like SPARQL on
|
||||
Virtuoso [11] that add a TRANSITIVE clause [12] to make this kind of
|
||||
queries both easy and efficient, but we want to see how to reach the
|
||||
same goal with standard SQL.
|
||||
|
||||
As you might guess, this becomes really easy once you have the
|
||||
transitive closure of the graph, we only have to add a WHERE clause
|
||||
specifying what our source and destination nodes are:
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string
|
||||
FROM edges
|
||||
WHERE a = 1 -- source
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
)
|
||||
SELECT * FROM transitive_closure
|
||||
WHERE b=6 -- destination
|
||||
ORDER BY a, b, distance;
|
||||
|
||||
" "
|
||||
|
||||
If we're showing the trip planning results, then we have a list of all
|
||||
possible travel solutions; instead of sorting by distance, we might
|
||||
sort by price or other parameters with little changes.
|
||||
|
||||
If we're showing how two people are connected (LinkedIN), then we can
|
||||
limit the result set to the first row, since we're probably interested
|
||||
in showing the shortest distance only and not all the other
|
||||
alternatives.
|
||||
|
||||
Instead of adding a LIMIT clause, it's probably more efficient to add
|
||||
"AND tc.distance = 0" to the WHERE clause of the recursive part of the
|
||||
CTE, or a GROUP BY clause as follows:
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string)
|
||||
AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
)
|
||||
SELECT a, b, min(distance) AS dist FROM transitive_closure
|
||||
--WHERE a = 1 AND b=6
|
||||
GROUP BY a, b
|
||||
ORDER BY a, dist, b;
|
||||
|
||||
" "
|
||||
|
||||
If you are interested in the immediate connections of a certain node,
|
||||
then specify the starting node and a distance equals to one (by
|
||||
limiting the recursion at the first level)
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance, a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
WHERE a = 1 -- set the starting node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance = 0 -- limit recursion at the first level
|
||||
)
|
||||
SELECT b FROM transitive_closure;
|
||||
|
||||
Of course to get the immediate connections there's no need for a
|
||||
recursive query (just use the one presented at the previous paragraph),
|
||||
but I thought I'd show it anyway as a first step towards more complex
|
||||
queries.
|
||||
|
||||
LinkedIN has a nice feature to show "How this user is connected to you"
|
||||
for non directly connected nodes.
|
||||
|
||||
If the distance between the two nodes is equal to 2, you can show the
|
||||
shared connections:
|
||||
SELECT b FROM (
|
||||
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance, a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
WHERE a = 1 -- set the starting node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance = 0
|
||||
)
|
||||
SELECT b FROM transitive_closure
|
||||
|
||||
UNION ALL
|
||||
|
||||
(WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance, a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
WHERE a = 4 -- set the target node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance = 0
|
||||
)
|
||||
SELECT b FROM transitive_closure
|
||||
)) AS immediate_connections
|
||||
GROUP BY b
|
||||
HAVING COUNT(b) > 1;
|
||||
|
||||
In the above query, we select the immediate connections of the two
|
||||
nodes separately, and then select the shared ones.
|
||||
|
||||
For nodes having a distance equals to 3, the approach is slightly
|
||||
different.
|
||||
|
||||
First, you check that the two nodes are indeed at a minimum distance of
|
||||
3 nodes (you're probably not interested in showing the relationship
|
||||
between two nodes when the distance is bigger):
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string
|
||||
FROM edges2
|
||||
WHERE a = 1 -- set the starting node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance < 3 -- stop the recursion after 3 levels
|
||||
)
|
||||
SELECT a, b, min(distance) FROM transitive_closure
|
||||
WHERE b=4 -- set the target node
|
||||
GROUP BY a, b
|
||||
HAVING min(distance) = 3; --set the minimum distance
|
||||
|
||||
Then you select the paths between those nodes.
|
||||
|
||||
But there's a different approach which is more generic and efficient,
|
||||
and can be used for all the nodes whose distance is bigger than 2.
|
||||
|
||||
The idea is to select the immediate neighbours of the starting node
|
||||
that are also in the path to the other node.
|
||||
|
||||
Depending on the distance, you can have either the shared nodes
|
||||
(distance = 2), or the connections that could lead to the other node
|
||||
(distance > 2). In the latter case, you could for instance show how A
|
||||
is connected to B:
|
||||
" "
|
||||
|
||||
Linkedin
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string,
|
||||
b AS direct_connection
|
||||
FROM edges2
|
||||
WHERE a = 1 -- set the starting node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string,
|
||||
tc.direct_connection
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance < 3
|
||||
)
|
||||
SELECT * FROM transitive_closure
|
||||
--WHERE b=3 -- set the target node
|
||||
ORDER BY a,b,distance
|
||||
|
||||
" "
|
||||
|
||||
Facebook: You might also know
|
||||
|
||||
A similar but slightly different requirement is to find those nodes
|
||||
that are most strongly related, but not directly connected yet. In
|
||||
other words, it's interesting to find out which and how many connected
|
||||
nodes are shared between any two nodes, i.e. how many 'friends' are
|
||||
shared between two individuals. Or better yet, to find those nodes
|
||||
sharing a certain (minimum) number of nodes with the current one.
|
||||
|
||||
This could be useful to suggest a new possible friend, or in the case
|
||||
of recommendation systems, to suggest a new item/genre that matches the
|
||||
user's interests.
|
||||
|
||||
There are many ways of doing this. In theory, this is bordering on the
|
||||
domain of collaborative filtering [6][7][8], so using Pearson's
|
||||
correlation [9] or a similar distance measure with an appropriate
|
||||
algorithm [10] is going to generate the best results. Collaborative
|
||||
filtering is an incredibly interesting topic on its own, but outside
|
||||
the scope of this article.
|
||||
|
||||
A rough and inexpensive alternative is to find the nodes having
|
||||
distance equals to 2, and filter those that either have a common
|
||||
characteristic with the source node (went to the same school / worked
|
||||
at the same company, belong to the same interest group / are items of
|
||||
the same genre) or have several mutual 'friends'.
|
||||
" "
|
||||
|
||||
Facebook
|
||||
|
||||
This, again, is easily done once you have the transitive closure of the
|
||||
graph:
|
||||
SELECT a AS you,
|
||||
b AS mightknow,
|
||||
shared_connection,
|
||||
CASE
|
||||
WHEN (n1.feat1 = n2.feat1 AND n1.feat1 = n3.feat1) THEN 'feat1 in commo
|
||||
n'
|
||||
WHEN (n1.feat2 = n2.feat2 AND n1.feat2 = n3.feat2) THEN 'feat2 in commo
|
||||
n'
|
||||
ELSE 'nothing in common'
|
||||
END AS reason
|
||||
FROM (
|
||||
WITH RECURSIVE transitive_closure(a, b, distance, path_string) AS
|
||||
( SELECT a, b, 1 AS distance,
|
||||
a || '.' || b || '.' AS path_string,
|
||||
b AS direct_connection
|
||||
FROM edges2
|
||||
WHERE a = 1 -- set the starting node
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT tc.a, e.b, tc.distance + 1,
|
||||
tc.path_string || e.b || '.' AS path_string,
|
||||
tc.direct_connection
|
||||
FROM edges2 AS e
|
||||
JOIN transitive_closure AS tc ON e.a = tc.b
|
||||
WHERE tc.path_string NOT LIKE '%' || e.b || '.%'
|
||||
AND tc.distance < 2
|
||||
)
|
||||
SELECT a,
|
||||
b,
|
||||
direct_connection AS shared_connection
|
||||
FROM transitive_closure
|
||||
WHERE distance = 2
|
||||
) AS youmightknow
|
||||
LEFT JOIN nodes AS n1 ON youmightknow.a = n1.id
|
||||
LEFT JOIN nodes AS n2 ON youmightknow.b = n2.id
|
||||
LEFT JOIN nodes AS n3 ON youmightknow.shared_connection = n3.id
|
||||
WHERE (n1.feat1 = n2.feat1 AND n1.feat1 = n3.feat1)
|
||||
OR (n1.feat2 = n2.feat2 AND n1.feat2 = n3.feat2);
|
||||
|
||||
" "
|
||||
|
||||
Once you have selected these nodes, you can filter those recurring more
|
||||
often, or give more importance to those having a certain feature in
|
||||
common, or pick one randomly (so you don't end up suggesting the same
|
||||
node over and over).
|
||||
|
||||
Conclusion
|
||||
|
||||
In this article I had some fun with the new and powerful CTEs, and
|
||||
showed some practical examples where they can be useful. I also showed
|
||||
some approaches at solving the challenges faced by any social network
|
||||
or recommendation system.
|
||||
|
||||
You are advised that depending on the size of the graph and the
|
||||
performance requirements of your application, the above queries might
|
||||
be too slow to run in realtime. Caching is your friend.
|
||||
|
||||
Update: Many of the queries in this article have been revised, so
|
||||
please refer
|
||||
to [50]http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age
|
||||
for changes.
|
||||
|
||||
References
|
||||
|
||||
[1] [51]http://willets.org/sqlgraphs.html
|
||||
|
||||
[2] [52]http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.53
|
||||
|
||||
[3] [53]http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/06/25
|
||||
/scalar-udfs-wrapped-in-check-constraints-are-very-slow-and-may-fail-fo
|
||||
r-multirow-updates.aspx
|
||||
|
||||
[4] [54]http://www.dbazine.com/oracle/or-articles/tropashko8
|
||||
|
||||
[5] [55]http://en.wikipedia.org/wiki/Collaborative_filtering
|
||||
|
||||
[6] [56]http://en.wikipedia.org/wiki/Slope_One
|
||||
|
||||
[7] blog.charliezhu.com/2008/07/21/implementing-slope-one-in-t-sql/
|
||||
|
||||
[8] bakara.eng.tau.ac.il/~semcomm/slides7/grouplensAlgs-Kahn.pps
|
||||
|
||||
[9] [57]http://www.slideshare.net/denisparra/evaluation-of-collaborativ
|
||||
e-filtering-algorithms-for-recommending-articles-on-citeulike
|
||||
|
||||
[10] [58]http://virtuoso.openlinksw.com/
|
||||
|
||||
[11] [59]http://www.openlinksw.com/weblog/oerling/?id=1433
|
||||
|
||||
[12] [60]http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.53
|
||||
|
||||
[13] [61]http://en.wikipedia.org/wiki/Transitive_reduction
|
||||
|
||||
You might also like...
|
||||
|
||||
A woman reviews code on her laptop
|
||||
|
||||
Headless commerce: everything you need to know
|
||||
|
||||
What the heck is headless? Discover the what, why, and when of headless
|
||||
architectures with our guide to headless commerce.
|
||||
|
||||
Drupal consulting and web development at Inviqa
|
||||
|
||||
The Drupal 9 upgrade Config Split issue and how to fix it
|
||||
|
||||
In this article we look back at an issue we’ve encountered with Drupal
|
||||
Config Split when upgrading Drupal 8 to 9 – and we share how to fix it,
|
||||
so you don’t have to run into the same issue when upgrading to Drupal
|
||||
9.
|
||||
|
||||
Inviqa, winner of Webby Awards The Webby Awards winner
|
||||
Inviqa named one of Top 100 Agencies in Econsultancy Top 100 Digital
|
||||
Agencies
|
||||
Inviqa UXUK Awards winner UXUK Awards winner
|
||||
DADI Award winner of 'Best UX / Usability category' DADI Award winner
|
||||
|
||||
Footer Main Navigation
|
||||
* [62]Home
|
||||
* [63]Who we are
|
||||
* [64]What we do
|
||||
* [65]Case studies
|
||||
* [66]Careers
|
||||
* [67]Insights
|
||||
* [68]Contact
|
||||
* [69]Accessibility statement
|
||||
|
||||
About us
|
||||
|
||||
Together with your teams, we shape the digital products, teams,
|
||||
processes, and software systems you need to meet diverse customer needs
|
||||
and accelerate your business growth.
|
||||
|
||||
© 2007-2024, Inviqa UK Ltd. Registered No. 06278367. Registered Office:
|
||||
Havas House, Hermitage Court, Hermitage Lane, Maidstone, ME16 9NT, UK.
|
||||
|
||||
Footer Legal Links
|
||||
* [70]Covid-19
|
||||
* [71]Privacy policy
|
||||
* [72]Sitemap
|
||||
|
||||
References
|
||||
|
||||
Visible links:
|
||||
1. https://inviqa.com/blog/storing-graphs-database-sql-meets-social-network
|
||||
2. https://www.googletagmanager.com/ns.html?id=GTM-NBN52P
|
||||
3. file:///var/folders/q9/qlz2w5251kzdfgn0np7z2s4c0000gn/T/L84755-9551TMP.html#main-content
|
||||
4. https://inviqa.com/
|
||||
5. https://inviqa.com/who-we-are
|
||||
6. https://inviqa.com/who-we-are
|
||||
7. https://www.havas.com/
|
||||
8. https://inviqa.com/digital-sustainability-journey
|
||||
9. https://inviqa.com/what-we-do
|
||||
10. https://inviqa.com/what-we-do/digital-strategy-consulting-and-development
|
||||
11. https://inviqa.com/what-we-do/digital-roadmap-development
|
||||
12. https://inviqa.com/what-we-do/digital-product-design
|
||||
13. https://inviqa.com/what-we-do/user-research
|
||||
14. https://inviqa.com/what-we-do/usability-testing
|
||||
15. https://inviqa.com/what-we-do/technical-architecture-consulting-and-development
|
||||
16. https://inviqa.com/what-we-do/digital-platform-consulting-and-implementation
|
||||
17. https://inviqa.com/what-we-do/experience-optimisation
|
||||
18. https://inviqa.com/what-we-do
|
||||
19. https://inviqa.com/case-studies
|
||||
20. https://inviqa.com/case-studies?category=b2b
|
||||
21. https://inviqa.com/case-studies#fashion
|
||||
22. https://inviqa.com/case-studies#charity
|
||||
23. https://inviqa.com/case-studies?category=retail
|
||||
24. https://inviqa.com/case-studies#leisure
|
||||
25. https://inviqa.com/case-studies?category=travel
|
||||
26. https://inviqa.com/case-studies
|
||||
27. https://inviqa.com/partners
|
||||
28. https://inviqa.com/akeneo-pim-consulting-and-implementation
|
||||
29. https://inviqa.com/blog/bigcommerce-7-best-sites
|
||||
30. https://inviqa.com/drupal-consulting-and-web-development
|
||||
31. https://inviqa.com/magento-consulting-and-web-development
|
||||
32. https://inviqa.com/blog/spryker-commerce-platform-introduction
|
||||
33. https://inviqa.com/partners
|
||||
34. https://careers.inviqa.com/
|
||||
35. https://careers.inviqa.com/
|
||||
36. https://careers.inviqa.com/jobs
|
||||
37. https://inviqa.com/insights
|
||||
38. https://inviqa.com/insights/dtc-ecommerce-report-2023
|
||||
39. https://inviqa.com/insights/PIM-readiness-framework
|
||||
40. https://inviqa.com/insights/retail-optimisation-guide-2023
|
||||
41. https://inviqa.com/blog
|
||||
42. https://inviqa.com/insights
|
||||
43. https://inviqa.com/contact
|
||||
44. https://inviqa.com/contact
|
||||
45. https://inviqa.com/
|
||||
46. https://inviqa.de/
|
||||
47. https://inviqa.com/blog#Technology engineering
|
||||
48. https://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/0123693799/157-5667933-6571053?ie=UTF8&redirect=true&tag=postcarfrommy-20
|
||||
49. https://en.wikipedia.org/wiki/Domain-key_normal_form
|
||||
50. http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age
|
||||
51. http://willets.org/sqlgraphs.html
|
||||
52. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.53
|
||||
53. http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/06/25/scalar-udfs-wrapped-in-check-constraints-are-very-slow-and-may-fail-for-multirow-updates.aspx
|
||||
54. http://www.dbazine.com/oracle/or-articles/tropashko8/
|
||||
55. https://en.wikipedia.org/wiki/Collaborative_filtering
|
||||
56. https://en.wikipedia.org/wiki/Slope_One
|
||||
57. http://www.slideshare.net/denisparra/evaluation-of-collaborative-filtering-algorithms-for-recommending-articles-on-citeulike
|
||||
58. http://virtuoso.openlinksw.com/
|
||||
59. http://www.openlinksw.com/weblog/oerling/?id=1433
|
||||
60. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.53
|
||||
61. https://en.wikipedia.org/wiki/Transitive_reduction
|
||||
62. https://inviqa.com/we-craft-game-changing-digital-experiences
|
||||
63. https://inviqa.com/who-we-are
|
||||
64. https://inviqa.com/what-we-do
|
||||
65. https://inviqa.com/case-studies
|
||||
66. https://careers.inviqa.com/
|
||||
67. https://inviqa.com/insights
|
||||
68. https://inviqa.com/contact
|
||||
69. https://inviqa.com/accessibility-statement
|
||||
70. https://inviqa.com/covid-19-measures
|
||||
71. https://inviqa.com/privacy-policy-UK
|
||||
72. https://inviqa.com/sitemap
|
||||
|
||||
Hidden links:
|
||||
74. https://inviqa.com/blog/headless-commerce-everything-you-need-know
|
||||
75. https://inviqa.com/blog/drupal-9-upgrade-config-split-issue-and-how-fix-it
|
||||
Reference in New Issue
Block a user