SQL Performance Explained

It’s short and very well-written. It’s the most valuable read on SQL I’ve ever seen. The book is available online.

The most important thing for backend developers is “Do NOT use OFFSET for pagination”.

Powers of indexing:

  1. The B-Tree traversal. It could be traversed in any direction as leaf nodes are organized in a double linked list.
  2. Clustering. Similar data is close, especially if it’s an index-only scan.
  3. Pipelined order by. Database can omit sort operation as an index is already sorted.

Rules:

  1. Index for equality first—then for ranges. The actual performance difference depends on the data and search criteria. The difference can be negligible if the filter on the range is very selective on its own. The bigger the date range becomes, the bigger the performance difference will be.
  2. Index could be useless for LIKE queries. Only the part before the first wild card serves as an access predicate. The remaining characters do not narrow the scanned index range—non-matching entries are just left out of the result.
    For the PostgreSQL database, the problem is different because PostgreSQL assumes there is a leading wild card when using bind parameters for a LIKE expression. PostgreSQL just does not use an index in that case. The only way to get an index access for a LIKE expression is to make the actual search term visible to the optimizer. If you do not use a bind parameter but put the search term directly into the SQL statement, you must take other precautions against SQL injection attacks!
  3. Maintain your index-only scans. Add comments that remind you about an index-only scan and refer to that page so anyone can read about it.
  4. Optimize access to tables with a singular index. Tables with one index only are best implemented as clustered indexes or index-organized tables.
    Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid table access. This gives you the select performance of a clustered index without slowing down other indexes.

Myths:

  1. Indexes Can Degenerate. Rebuilding an index might reduce the number of leaf nodes by about 20% - 30%. The most you can possibly expect from this reduction is 20%-30% for very expensive operations like a FULL INDEX SCAN. The typical INDEX UNIQUE SCAN gain of an index rebuild is 0%-2% because the depth of the index is not reduced by the rebuild.
  2. Most Selective First. There is the myth that you should always put the most selective column to the first position; that is just wrong. The most important consideration when defining a concatenated index is how to choose the column order so it can be used as often as possible.
  3. Oracle Cannot Index NULL.
  4. Dynamic SQL is Slow.
  5. Select * is Bad.

I wish there was a book about other index types (this one explains B-tree indexes in detail).