PostgreSQL custom SQL collations

I had used this feature before and forgotten about it, but it’s really cool.

In Postgres you can use expressions in order-by clauses.  This is incredibly powerful because it allows you to have your own custom collation or sort sequence for “order by” clauses.

For example, I have a simple table of people and tshirt sizes:


tshirts=# \d tshirts
Table "public.tshirts"
   Column   |     Type     |                      Modifiers
------------+--------------+------------------------------------------------------
 id         | integer      | not null default (nextval('tshirts_id_seq'::regclass)
 name       | text         | not null
 shirt_size | character(2) |

Indexes:
 "tshirts_pkey" PRIMARY KEY, btree (id)

Check constraints:
 "tshirts_shirt_size_check" CHECK (shirt_size = ANY (ARRAY['S'::bpchar, 'M'::bpchar, 'L'::bpchar, 'XL'::bpchar, '2X'::bpchar, '3X'::bpchar, '4X'::bpchar, '5X'::bpchar]))

Here is the most common method most people would use to get tshirt counts to populate a tshirt order form:


select shirt_size
     , count(*)
  from tshirts
 group by shirt_size
 order by shirt_size
;

It returns this:


 shirt_size | count
------------+-------
 2X         |    13
 3X         |     7
 4X         |     1
 5X         |     1
 L          |    38
 M          |    45
 S          |     5
 XL         |    28

But I’d rather have it in a more natural order like below:


 shirt_size | count
------------+-------
 S          |     5
 M          |    45
 L          |    38
 XL         |    28
 2X         |    13
 3X         |     7
 4X         |     1
 5X         |     1

Here’s one way to accomplish that easily:


select shirt_size
     , count(*)
  from tshirts
 group by shirt_size
 order by case shirt_size
            when 'S' then 1
            when 'M' then 2
            when 'L' then 3
            when 'XL' then 4
            when '2X' then 5
            when '3X' then 6
            when '4X' then 7
            when '5X' then 8
            else 9
          end   asc
;

I threw in the “else” clause to capture anything I may have missed, or if any bad data made into the table somehow.

And it works exactly like I want.

This is a very simple example, but can be very useful in some cases.

Here’s a more real-world example.  I use this SQL to show me postgres threads, putting the active ones at the top, idle-in-trans next, then idle threads at the end, and all sorted oldest first for each group:


select a.pid as "PID"
     , a.datname as "DB"
     , a.usename as "User"
     , case when a.client_addr is null
         then 'local'
         else host(a.client_addr)
       end as "IP"
     , date_trunc('seconds',a.query_start) as "Start"
     , case waiting when false then ' ' when true then 'Y' end as "W"
     , case state when 'idle in transaction' then 'idle-trn' else state end as "State"
     , case state
         when 'active' then ltrim(regexp_replace(a.query,E'[\n\f\t\r ]+',' ','g'))
         when 'idle in transaction' then ltrim(regexp_replace(a.query,E'[\n\f\t\r ]+',' ','g'))
         else ''
       end as "Active Query"
  from pg_stat_activity a
 where a.pid != pg_backend_pid()
 order by state != 'idle' desc
        , state != 'idle in transaction' desc
        , query_start
 limit 20
;

And here’s what it gives me:


 PID  |   DB    |  User   |  IP   |         Start          | W |  State   |                               Active Query
------+---------+---------+-------+------------------------+---+----------+--------------------------------------------------------------------------
 4745 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | active   | delete from pgbench_history where mtime < now() - '3 minutes'::interval;
 4772 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | active   | vacuum;
 4778 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | active   | select pg_sleep(9);
 4844 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 | Y | active   | UPDATE deadlock_test_table SET data='goodbye' WHERE id=1;
 4856 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 | Y | active   | select count(*) from pglocktest_user;
 4863 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 |   | active   | select pg_sleep(4);
 4858 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 | Y | active   | truncate pgbench_history
 4903 | pgbench | pgbench | local | 2017-05-23 16:03:34-05 | Y | active   | vacuum;
 4914 | pgbench | pgbench | local | 2017-05-23 16:03:34-05 |   | active   | select pg_sleep(5);
 4846 | pgbench | pgbench | local | 2017-05-23 16:03:35-05 | Y | active   | UPDATE deadlock_test_table SET data='goodbye' WHERE id=2;
 4784 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | idle-trn | BEGIN;
 4845 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | idle-trn | LOCK TABLE pglocktest_user;
 4888 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 |   | idle-trn | BEGIN;
 4924 | pgbench | pgbench | local | 2017-05-23 16:03:35-05 |   | idle-trn | BEGIN;
 4785 | pgbench | pgbench | local | 2017-05-23 16:03:32-05 |   | idle     |
 4889 | pgbench | pgbench | local | 2017-05-23 16:03:33-05 |   | idle     |
(16 rows)

I hope you have fun with it and find it useful!

Advertisements