How to Properly Paginate Through a Database Table with an ORM

The Problem

Fun fact, this is a bad idea:

SELECT * FROM my_table LIMIT 100 OFFSET 1000000;

Why?

When you use OFFSET in SQL, the cursor will still have to scan rows in order to satisfy the offset because it’s not known offhand what row the offset starts at. So instead it’s better to do:

SELECT * FROM my_table WHERE my_table.id > 1000000 LIMIT 100;

How to Paginate with an ORM:

The reason we care about pagination in the first place is to avoid loading a massive query into memory or locking a table for an extended period of time. So if we take the above concept and apply it to an ORM, this is what the code might look like in Flask:

from my_module.db_tools import Session
from my_module.models import ArbitraryDatabaseTable

PAGINATION_BATCH_SIZE = 1000


def generate_all_rows():
    """ Safely iterates over all records in a database without a large lock, slow cursor,
    large memory footprint, or delay from scanning rows. """
    last_element_id = None

    while True:
        query = Session.query(ArbitraryDatabaseTable).order_by(
            ArbitraryDatabaseTable.id.asc(),
        )

        if last_element_id:
            query = query.filter(
                ArbitraryDatabaseTable.id > last_element_id
            )

        query = query.limit(PAGINATION_BATCH_SIZE)

        rows = query.all()
        for row in rows:
            yield row

        if len(rows) < PAGINATION_BATCH_SIZE:
            break

        last_element_id = row.id

The End