Pull in Viget posts

2023-10-22 23:52:56 -04:00
parent 625d374135
commit 0438a6d828
77 changed files with 8219 additions and 5 deletions
--- a/content/elsewhere/regular-expressions-in-mysql/index.md
+++ b/content/elsewhere/regular-expressions-in-mysql/index.md
@@ -0,0 +1,75 @@
+---
+title: "Regular Expressions in MySQL"
+date: 2011-09-28T00:00:00+00:00
+draft: false
+needs_review: true
+canonical_url: https://www.viget.com/articles/regular-expressions-in-mysql/
+---
+
+Did you know MySQL supports using [regular
+expressions](https://en.wikipedia.org/wiki/Regular_expression) in
+`SELECT` statements? I'm surprised at the number of developers who
+don't, despite using SQL and regexes on a daily basis. That's not to say
+that putting a regex into your SQL should be a daily occurrence. In
+fact, it can [cause more problems than it
+solves](https://en.wikiquote.org/wiki/Jamie_Zawinski#Attributed), but
+it's a handy tool to have in your belt under certain circumstances.
+
+## Basic Usage
+
+Regular expressions in MySQL are invoked with the
+[`REGEXP`](http://dev.mysql.com/doc/refman/5.1/en/regexp.html) keyword,
+aliased to `RLIKE`. The most basic usage is a hardcoded regular
+expression in the right hand side of a conditional clause, e.g.:
+
+    SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@'; 
+
+This SQL would grab every user whose email address begins with 'a', 'b',
+or 'c' and has a number as the final character of its local portion.
+
+## Something More Advanced
+
+The regex used with RLIKE does not need to be hardcoded into the SQL
+statement, and can *in fact* be a column in the table being queried. In
+a recent project, we were tasked with creating an interface for managing
+redirect rules à la
+[mod_rewrite](http://httpd.apache.org/docs/current/mod/mod_rewrite.html).
+We were able to do the entire match in the database, using SQL like this
+(albeit with a few more joins, groups and orders):
+
+    SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern; 
+
+In this case, '/news' is the incoming request path and `pattern` is the
+column that stores the regular expression. In our benchmarks, we found
+this approach to be much faster than doing the regular expression
+matching in Ruby, mostly because of the lack of ActiveRecord overhead.
+
+## Caveats
+
+Using regular expressions in your SQL has the potential to be slow.
+These queries can't use indexes, so a full table scan is required. If
+you can get away with using `LIKE`, which has some regex-like
+functionality, you should. As always: benchmark, benchmark, benchmark.
+
+Additionally, MySQL supports
+[POSIX](https://en.wikipedia.org/wiki/POSIX) regular expressions, not
+[PCRE](http://www.pcre.org/) like Ruby. There are things (like negative
+lookaheads) that you simply can't do, though you probably ought not to
+be doing them in your SQL anyway.
+
+## In PostgreSQL
+
+Support for regular expressions in PostgreSQL is similar to that of
+MySQL, though the syntax is different (e.g. `email ~ '^a'` instead of
+`email RLIKE '^a'`). What's more, Postgres contains some useful
+functions for working with regular expressions, like `substring` and
+`regexp_replace`. See the
+[documentation](http://www.postgresql.org/docs/9.0/static/functions-matching.html)
+for more information.
+
+## Conclusion
+
+In certain circumstances, regular expressions in SQL are a handy
+technique that can lead to faster, cleaner code. Don\'t use `RLIKE` when
+`LIKE` will suffice and be sure to benchmark your queries with datasets
+similar to the ones you'll be facing in production.