Recommended Pagination for APIs

Published:
Translations:no translations yet

The problem

Your API needs to return a list of entities. The list can be quite big, even if you designed useful filters.

Most of the time a client doesn’t need the entire list because:

The solution is to introduce pagination to your API.

The article clarifies choices you have regarding API design and backend implementation.

Offset Pagination

Popular and simple. It is popular because SQL has LIMIT and OFFSET out of the box. It is simple because you don’t need a lot of business logic to wrap it.

Typical usage

  1. A client makes a request for the first page: GET /entities?limit=30
  2. To retrieve the next page, the client makes a second request GET /entities?limit=30&offset=20

The second request translates to this SQL:

SELECT * FROM entities
ORDER BY id
LIMIT 30
OFFSET 20;

Pros

Cons


SQL’s OFFSET allows to skip the first N results of a query. However, the database must read and sort first N rows before it can return the rest. This is not an implementation problem, it’s the way offset is designed:

…the rows are first sorted according to the <order by clause> and then limited by dropping the number of rows specified in the <result offset clause> from the beginning…
SQL:2016, Part 2, §4.15.3 Derived tables

That means, big offsets impose a lot of work on the database.


Sometimes performance is not a big problem. Then page drift is a problem.

Imagine we have 8 entities registered in the system. Pages are of size 3. Entities are sorted by created_at. User goes to page 2 using offset pagination. To him, all entity pages look like this (page boundaries are brackets):

[0, 1, 2] [3, 4, 5 (user here)], [6, 7]. Next offset is 6.

While the user is staring at the entities, two new entities X and Y get registered in the system (by another user or a system). User clicks to see the next page, and what he sees?

[X, Y, 0] [1, 2, 3], [4, 5], 6 (user here)], [7].

He sees entities 4 and 5 again even though he went to the next page. This is not a big problem for UIs if you don’t care if your users are confused seeing the same. Or if you don’t care if your users don’t see something, because offset pagination also can shift pages so entities get lost. But it’s a huge pain if you want to automate this API. The client is always not sure if he saw those entities or not. Client deduplication kills adequate usage patterns of offset pagination.


That means, API using offset pagination is unreliable and not scalable. It can be used only for a small number of entities. What’s the point of introducing pagination, if you don’t have many entities in the first place? I don’t think it should be really used in modern APIs.

Simple keyset pagination

Keyset pagination uses information about the current page to fetch the next page.

Typical usage

  1. A client makes a request for the first page: GET /entities?limit=20
  2. To retrieve the next page, the client makes a second request GET /entities?limit=20&created_after=2021-07-07T14:32:01. The client gets the created_after parameter from the created_at field of last entity of the current page.

The second request translates to:

SELECT * FROM entities
WHERE
  created_at >= '2021-07-07T14:32:01'
ORDER BY id
LIMIT 20

Pros

Cons


Simple keyset pagination works very well for data with keys such as timestamp with high resolution.

Keyset pagination on natural keys

This one extends simple approach by introducing separate cursor API parameter. The parameter is based on unique identifiers presented in database rows.

Typical usage

  1. A client makes a request for the first page: GET /entities?limit=30
  2. The client makes a second request GET /entities?limit=20&cursor=33. The client gets the cursor parameter from the id field of last entity of the current page.

The second request translates to:

SELECT * FROM entities
WHERE
  id > 33
ORDER BY id
LIMIT 20

In real life, it is usually more complex.


First, we still need to support sorting by different fields. For example, if we need to sort entities by name, it looks like this:

SELECT * FROM entities
WHERE
  id > 33
ORDER BY name, id
LIMIT 20

The backend needs to carefully combine cursor parameter with filters and sorting.


Second, sometimes it’s not easy to choose which natural key to use. This is especially true for complex cross-table queries. Every situation is unique, and you probably need to invent a composite cursor.


Third, natural keys are not expected to be ordered. The simplest example is using UUIDs to identify entities. GET /entities?limit=20&cursor={uuid}&sort_by=name won’t work as expected because query like this makes no sense for uuids:

SELECT * FROM entities
WHERE
  id > 'd4f0a9e8-ae6e-43fa-ad78-740ed8f079c0'
ORDER BY name, id
LIMIT 20

You’ll need composite cursors then, and it will look like this: (name, uuid).

Then the query GET /entities?limit=20&cursor=entitityName,d4f0a9e8-ae6e-43fa-ad78-740ed8f079c0&sort_by=name translates to

SELECT * FROM (
  SELECT * FROM entities
  WHERE
    name > 'entityName'
  ORDER BY name, id
)
WHERE 
  id > 'd4f0a9e8-ae6e-43fa-ad78-740ed8f079c0'
LIMIT 20

It looks pretty bad in URL and exposes implementation details to clients. But it’s not a problem, it’s an opportunity.

Backend can encode cursor, so client won’t even try to interpret and modify it. For example, use base64 + JSONs (JWT-like approach). The example above becomes cursor = base64({"n":"entityName","id":"d4f0a9e8-ae6e-43fa-ad78-740ed8f079c0"}). I.e. the cursor will look like eyJuYW1lIjoiZW50aXR5TmFtZSIsImlkIjoiZDRmMGE5ZTgtYWU2ZS00M2ZhLWFkNzgtNzQwZWQ4ZjA3OWMwIn0=.

You can even start by base64-ing JSONs like {"offset": 10}, implement API level, and then carefully refactor the database level later when you hit offset’s problems.

In other words, it can bring great flexibility in implementation in exchange of complexity.

Pros

Cons


Of course, it’s not good to force the client to figure out cursor value based on clues in entities. For this reason, it’s recommended to return responses like this:

{
    "items": [],
    "page": {
        "next": "encoded cursor value",
        "prev": "encoded cursor value"
    }
}

Conclusion

I would recommend to use some form of keyset pagination for all new APIs. This is consistent and flexible.