Pull in Viget posts
This commit is contained in:
75
content/elsewhere/regular-expressions-in-mysql/index.md
Normal file
75
content/elsewhere/regular-expressions-in-mysql/index.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
title: "Regular Expressions in MySQL"
|
||||
date: 2011-09-28T00:00:00+00:00
|
||||
draft: false
|
||||
needs_review: true
|
||||
canonical_url: https://www.viget.com/articles/regular-expressions-in-mysql/
|
||||
---
|
||||
|
||||
Did you know MySQL supports using [regular
|
||||
expressions](https://en.wikipedia.org/wiki/Regular_expression) in
|
||||
`SELECT` statements? I'm surprised at the number of developers who
|
||||
don't, despite using SQL and regexes on a daily basis. That's not to say
|
||||
that putting a regex into your SQL should be a daily occurrence. In
|
||||
fact, it can [cause more problems than it
|
||||
solves](https://en.wikiquote.org/wiki/Jamie_Zawinski#Attributed), but
|
||||
it's a handy tool to have in your belt under certain circumstances.
|
||||
|
||||
## Basic Usage
|
||||
|
||||
Regular expressions in MySQL are invoked with the
|
||||
[`REGEXP`](http://dev.mysql.com/doc/refman/5.1/en/regexp.html) keyword,
|
||||
aliased to `RLIKE`. The most basic usage is a hardcoded regular
|
||||
expression in the right hand side of a conditional clause, e.g.:
|
||||
|
||||
SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@';
|
||||
|
||||
This SQL would grab every user whose email address begins with 'a', 'b',
|
||||
or 'c' and has a number as the final character of its local portion.
|
||||
|
||||
## Something More Advanced
|
||||
|
||||
The regex used with RLIKE does not need to be hardcoded into the SQL
|
||||
statement, and can *in fact* be a column in the table being queried. In
|
||||
a recent project, we were tasked with creating an interface for managing
|
||||
redirect rules à la
|
||||
[mod_rewrite](http://httpd.apache.org/docs/current/mod/mod_rewrite.html).
|
||||
We were able to do the entire match in the database, using SQL like this
|
||||
(albeit with a few more joins, groups and orders):
|
||||
|
||||
SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern;
|
||||
|
||||
In this case, '/news' is the incoming request path and `pattern` is the
|
||||
column that stores the regular expression. In our benchmarks, we found
|
||||
this approach to be much faster than doing the regular expression
|
||||
matching in Ruby, mostly because of the lack of ActiveRecord overhead.
|
||||
|
||||
## Caveats
|
||||
|
||||
Using regular expressions in your SQL has the potential to be slow.
|
||||
These queries can't use indexes, so a full table scan is required. If
|
||||
you can get away with using `LIKE`, which has some regex-like
|
||||
functionality, you should. As always: benchmark, benchmark, benchmark.
|
||||
|
||||
Additionally, MySQL supports
|
||||
[POSIX](https://en.wikipedia.org/wiki/POSIX) regular expressions, not
|
||||
[PCRE](http://www.pcre.org/) like Ruby. There are things (like negative
|
||||
lookaheads) that you simply can't do, though you probably ought not to
|
||||
be doing them in your SQL anyway.
|
||||
|
||||
## In PostgreSQL
|
||||
|
||||
Support for regular expressions in PostgreSQL is similar to that of
|
||||
MySQL, though the syntax is different (e.g. `email ~ '^a'` instead of
|
||||
`email RLIKE '^a'`). What's more, Postgres contains some useful
|
||||
functions for working with regular expressions, like `substring` and
|
||||
`regexp_replace`. See the
|
||||
[documentation](http://www.postgresql.org/docs/9.0/static/functions-matching.html)
|
||||
for more information.
|
||||
|
||||
## Conclusion
|
||||
|
||||
In certain circumstances, regular expressions in SQL are a handy
|
||||
technique that can lead to faster, cleaner code. Don\'t use `RLIKE` when
|
||||
`LIKE` will suffice and be sure to benchmark your queries with datasets
|
||||
similar to the ones you'll be facing in production.
|
||||
Reference in New Issue
Block a user