Recommended Pagination for APIs

The problem

Your API needs to return a list of entities. The list can be quite big, even if you designed useful filters.

Most of the time a client doesn’t need the entire list because:

UX will suffer. If you have a frontend application, users don’t need to scroll thousands of entities at once.
Performance will suffer. This happens both on a client and backend. The client needs to parse and render all retrieved entities, backend needs to read them all from a database.

The solution is to introduce pagination to your API.

The article clarifies choices you have regarding API design and backend implementation.

Offset Pagination

Popular and simple. It is popular because SQL has LIMIT and OFFSET out of the box. It is simple because you don’t need a lot of business logic to wrap it.

Typical usage

A client makes a request for the first page: GET /entities?limit=30
To retrieve the next page, the client makes a second request GET /entities?limit=30&offset=20

The second request translates to this SQL:

SELECT * FROM entities
ORDER BY id
LIMIT 30
OFFSET 20;

Pros

Easy to implement. Many frameworks support it out of the box.
Supports page transition in random access mode. A client can go from page 1 straight to page 10, then to page 5. It’s rarely needed, though.

Cons

Kills database on large offsets.
Pages drift.

SQL’s OFFSET allows to skip the first N results of a query. However, the database must read and sort first N rows before it can return the rest. This is not an implementation problem, it’s the way offset is designed:

…the rows are first sorted according to the <order by clause> and then limited by dropping the number of rows specified in the <result offset clause> from the beginning…

SQL:2016, Part 2, §4.15.3 Derived tables

That means, big offsets impose a lot of work on the database.

Sometimes performance is not a big problem. Then page drift is a problem.

Imagine we have 8 entities registered in the system. Pages are of size 3. Entities are sorted by created_at. User goes to page 2 using offset pagination. To him, all entity pages look like this (page boundaries are brackets):

[0, 1, 2] [3, 4, 5 (user here)], [6, 7]. Next offset is 6.

While the user is staring at the entities, two new entities X and Y get registered in the system (by another user or a system). User clicks to see the next page, and what he sees?

[X, Y, 0] [1, 2, 3], [4, 5], 6 (user here)], [7].

He sees entities 4 and 5 again even though he went to the next page. This is not a big problem for UIs if you don’t care if your users are confused seeing the same. Or if you don’t care if your users don’t see something, because offset pagination also can shift pages so entities get lost. But it’s a huge pain if you want to automate this API. The client is always not sure if he saw those entities or not. Client deduplication kills adequate usage patterns of offset pagination.

That means, API using offset pagination is unreliable and not scalable. It can be used only for a small number of entities. What’s the point of introducing pagination, if you don’t have many entities in the first place? I don’t think it should be really used in modern APIs.

Simple keyset pagination

Keyset pagination uses information about the current page to fetch the next page.

Typical usage

A client makes a request for the first page: GET /entities?limit=20
To retrieve the next page, the client makes a second request GET /entities?limit=20&created_after=2021-07-07T14:32:01. The client gets the created_after parameter from the created_at field of last entity of the current page.

The second request translates to:

SELECT * FROM entities
WHERE
  created_at >= '2021-07-07T14:32:01'
ORDER BY id
LIMIT 20

Pros

Works with existing filters of the API. You only need to add the limit parameter.
No page drift (almost).
Consistent performance, especially if you use database indexes.

Cons

Tight coupling of pagination and sorting. Forces API users to add filters even if no filters are intended.
Does not work when fields are often duplicated. Even with timestamps pages may drift, the ordering will be random for fields like enum values.
Complicated for clients when they want to combine sorting and pagination on the same field.

Simple keyset pagination works very well for data with keys such as timestamp with high resolution.

Keyset pagination on natural keys

This one extends simple approach by introducing separate cursor API parameter. The parameter is based on unique identifiers presented in database rows.

Typical usage

A client makes a request for the first page: GET /entities?limit=30
The client makes a second request GET /entities?limit=20&cursor=33. The client gets the cursor parameter from the id field of last entity of the current page.

The second request translates to:

SELECT * FROM entities
WHERE
  id > 33
ORDER BY id
LIMIT 20

In real life, it is usually more complex.

First, we still need to support sorting by different fields. For example, if we need to sort entities by name, it looks like this:

SELECT * FROM entities
WHERE
  id > 33
ORDER BY name, id
LIMIT 20

The backend needs to carefully combine cursor parameter with filters and sorting.

Second, sometimes it’s not easy to choose which natural key to use. This is especially true for complex cross-table queries. Every situation is unique, and you probably need to invent a composite cursor.

Third, natural keys are not expected to be ordered. The simplest example is using UUIDs to identify entities. GET /entities?limit=20&cursor={uuid}&sort_by=name won’t work as expected because query like this makes no sense for uuids:

SELECT * FROM entities
WHERE
  id > 'd4f0a9e8-ae6e-43fa-ad78-740ed8f079c0'
ORDER BY name, id
LIMIT 20

You’ll need composite cursors then, and it will look like this: (name, uuid).

Then the query GET /entities?limit=20&cursor=entitityName,d4f0a9e8-ae6e-43fa-ad78-740ed8f079c0&sort_by=name translates to

SELECT * FROM (
  SELECT * FROM entities
  WHERE
    name > 'entityName'
  ORDER BY name, id
)
WHERE 
  id > 'd4f0a9e8-ae6e-43fa-ad78-740ed8f079c0'
LIMIT 20

It looks pretty bad in URL and exposes implementation details to clients. But it’s not a problem, it’s an opportunity.

Backend can encode cursor, so client won’t even try to interpret and modify it. For example, use base64 + JSONs (JWT-like approach). The example above becomes cursor = base64({"n":"entityName","id":"d4f0a9e8-ae6e-43fa-ad78-740ed8f079c0"}). I.e. the cursor will look like eyJuYW1lIjoiZW50aXR5TmFtZSIsImlkIjoiZDRmMGE5ZTgtYWU2ZS00M2ZhLWFkNzgtNzQwZWQ4ZjA3OWMwIn0=.

You can even start by base64-ing JSONs like {"offset": 10}, implement API level, and then carefully refactor the database level later when you hit offset’s problems.

In other words, it can bring great flexibility in implementation in exchange of complexity.

Pros

No coupling of pagination logic to filter logic.
No page drift.
Consistent performance.

Cons

No “out of the box” support in tools. Here’s examples for Django and jOOQ, but not many other frameworks support it.
More complex for backend to implement.
Cursors can break if entities are deleted from the database.

Of course, it’s not good to force the client to figure out cursor value based on clues in entities. For this reason, it’s recommended to return responses like this:

{
    "items": [],
    "page": {
        "next": "encoded cursor value",
        "prev": "encoded cursor value"
    }
}

Conclusion

I would recommend to use some form of keyset pagination for all new APIs. This is consistent and flexible.