You'd think that often people would need to sort lists ignoring the definite article 'the'. Film titles, band names, Sherlock Holmes's cases. But a search of the web brings precious little, especially as you're searching for 'the'. Which is practically everywhere.
How do you order a list, e.g. of bands, whilst ignoring 'The': The Hives, The Strokes, etc, whilst keeping order with other bands such as Pram and Blur. I searched high and low for it, but with special thanks to James Wilford, we've come up with an answer.
A simple response to the problem would be to fetch the results and then sort using SQL having stripped the starting 'the'. However, in the project I'm working on currently, there's a need to allow for paging.
This means using SQL to strip the 'the' and using LIMIT to select only the current page of results.
You can order MySQL expressions according to other expressions, so, instead of a PHP function, instead of a new column with the same entry but with the 'the' removed (both of which I found as answers), you make use of the expression clause in ORDER BY.
The expression I first attempted was using REGEXP, which allows you to search or order, etc according to a regular expression. To start, I tried to order based on whether or not an artist contains, simply a letter. I had
SELECT 'Blur' REGEXP 'b';
That's true, hence => 1; is the results set (Note: REGEXP is not case sensitive for strings (i.e. non-binary). In order to do this, you need to cast the search parameter as binary, see http://dev.mysql.com/doc/mysql/en/String_comparison_functions.html).
SELECT 'The Hives' REGEXP 'The ';
Is also true. When I put it in my ORDER BY expression
SELECT * FROM artists ORDER BY ( artist REGEXP 'The ' < )
I got this result:
Blur Pram The Hives The Strokes
The problem here is that the ORDER BY expresion orders first by things that don't match, then those that do match the regular expression. Since REGEXP only ever returns 0 or 1, you won't get the sorted list you're looking for.
The master stroke comes in thinking that we just want to order the names of the bands, no matter if there's a 'The ' in there or not. This means, that for the purposes of the ORDER BY, we can trim the beginning 'The '.
This is where James' master-stroke comes in. We can use MySQL's string functions to trim only the parameter of the ORDER BY clause leaving the results intact! Happily, a function to do this exists in MySQL: TRIM.
The code:
SELECT artist FROM artists ORDER BY TRIM(LEADING 'The ' FROM artist);
(Note: don't leave a space between TRIM and the open parentheses: you'll get a syntax error!)
What this code does is a simple SELECT, then orders the results by a function acting on 'artist'. TRIM will either slice off the LEADING or TRAILING or BOTH examples of a removal string: in this case 'The '. (For the record, if you don't specify LEADING or TRAILING or BOTH and the removal string, TRIM will trim the whitespace on either side. see
http://dev.mysql.com/doc/mysql/en/String_functions.html)
This gives us what we're looking for:
Blur
The Hives
Pram
The Strokes
And devotees of the bands are happy!