Files
davideisinger.com/content/elsewhere/regular-expressions-in-mysql/index.md
2023-10-24 20:48:09 -04:00

79 lines
3.1 KiB
Markdown

---
title: "Regular Expressions in MySQL"
date: 2011-09-28T00:00:00+00:00
draft: false
canonical_url: https://www.viget.com/articles/regular-expressions-in-mysql/
---
Did you know MySQL supports using [regular
expressions](https://en.wikipedia.org/wiki/Regular_expression) in
`SELECT` statements? I'm surprised at the number of developers who
don't, despite using SQL and regexes on a daily basis. That's not to say
that putting a regex into your SQL should be a daily occurrence. In
fact, it can [cause more problems than it
solves](https://en.wikiquote.org/wiki/Jamie_Zawinski#Attributed), but
it's a handy tool to have in your belt under certain circumstances.
## Basic Usage
Regular expressions in MySQL are invoked with the
[`REGEXP`](http://dev.mysql.com/doc/refman/5.1/en/regexp.html) keyword,
aliased to `RLIKE`. The most basic usage is a hardcoded regular
expression in the right hand side of a conditional clause, e.g.:
```sql
SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@';
```
This SQL would grab every user whose email address begins with 'a', 'b',
or 'c' and has a number as the final character of its local portion.
## Something More Advanced
The regex used with RLIKE does not need to be hardcoded into the SQL
statement, and can *in fact* be a column in the table being queried. In
a recent project, we were tasked with creating an interface for managing
redirect rules à la
[mod_rewrite](http://httpd.apache.org/docs/current/mod/mod_rewrite.html).
We were able to do the entire match in the database, using SQL like this
(albeit with a few more joins, groups and orders):
```sql
SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern;
```
In this case, '/news' is the incoming request path and `pattern` is the
column that stores the regular expression. In our benchmarks, we found
this approach to be much faster than doing the regular expression
matching in Ruby, mostly because of the lack of ActiveRecord overhead.
## Caveats
Using regular expressions in your SQL has the potential to be slow.
These queries can't use indexes, so a full table scan is required. If
you can get away with using `LIKE`, which has some regex-like
functionality, you should. As always: benchmark, benchmark, benchmark.
Additionally, MySQL supports
[POSIX](https://en.wikipedia.org/wiki/POSIX) regular expressions, not
[PCRE](http://www.pcre.org/) like Ruby. There are things (like negative
lookaheads) that you simply can't do, though you probably ought not to
be doing them in your SQL anyway.
## In PostgreSQL
Support for regular expressions in PostgreSQL is similar to that of
MySQL, though the syntax is different (e.g. `email ~ '^a'` instead of
`email RLIKE '^a'`). What's more, Postgres contains some useful
functions for working with regular expressions, like `substring` and
`regexp_replace`. See the
[documentation](http://www.postgresql.org/docs/9.0/static/functions-matching.html)
for more information.
## Conclusion
In certain circumstances, regular expressions in SQL are a handy
technique that can lead to faster, cleaner code. Don't use `RLIKE` when
`LIKE` will suffice and be sure to benchmark your queries with datasets
similar to the ones you'll be facing in production.