Skip to content

Filtering

As a relational database, TiDB supports a rich set of SQL operators and allows flexible combinations of filtering conditions that enable you to query your data precisely.

Overview

You can not only apply filtering on scalar fields but also on JSON fields. Filtering on JSON fields is often used for metadata filtering in vector search.

For PyTiDB, you can apply filtering by passing a filters parameter to the table.query(), table.delete(), table.update(), and table.search() methods.

The filters parameter supports two formats: Dictionary Filters and SQL String Filters.

Dictionary Filters

PyTiDB allows you to define filter conditions using a Python dictionary with operators as the filters parameter.

The dictionary structure of filters is as follows:

{
    "<key>": {
        "<operator>": <value>
    },
    ...
}

Example: Filter records where created_at is greater than 2024-01-01

table.query({
    # The `created_at` is a scalar field with DATETIME type
    "created_at": {
        "$gt": "2024-01-01"
    }
})

Example: Filter records where meta.category is in the array ["tech", "science"]

results = (
    table.search("some query", search_type="vector")
        .filter({
            # The `meta` is a JSON field, and its value is a JSON object like {"category": "tech"}
            "meta.category": {
                "$in": ["tech", "science"]
            }
        })
        .limit(10)
        .to_list()
)

Compare operators

You can use the following compare operators to filter records:

Operator Description
$eq Equal to value
$ne Not equal to value
$gt Greater than value
$gte Greater than or equal to value
$lt Less than value
$lte Less than or equal to value

Example: filter records where user_id is equal to 1

{
    "user_id": {
        "$eq": 1
    }
}

You can omit the $eq operator. The following query is equivalent to the above:

{
    "user_id": 1
}

Inclusion operators

You can use the following inclusion operators to filter records:

Operator Description
$in In array (string, int, or float)
$nin Not in array (string, int, float)

Example: Filter records where category is in the array ["tech", "science"]

{
    "category": {
        "$in": ["tech", "science"]
    }
}

Logical operators

You can use the logical operators $and and $or to combine multiple filters.

Operator Description
$and Returns results that match all filters in the list
$or Returns results that match any filter in the list

Syntax for using $and or $or:

{
    "$and|$or": [
        {
            "field_name": {
                <operator>: <value>
            }
        },
        {
            "field_name": {
                <operator>: <value>
            }
        }
        ...
    ]
}

Example: using $and to combine multiple filters:

{
    "$and": [
        {
            "created_at": {
                "$gt": "2024-01-01"
            }
        },
        {
            "meta.category": {
                "$in": ["tech", "science"]
            }
        }
    ]
}

SQL String Filters

You can also use a SQL string as the filters parameter. The string should be a valid SQL WHERE clause (without the WHERE keyword) using TiDB's SQL syntax.

Example: Filter records where created_at is greater than 2024-01-01

results = table.query(
    filters="created_at > '2024-01-01'",
    limit=10
).to_list()

Example: Filter records where the JSON field meta.category equals 'tech'

results = table.query(
    filters="meta->>'$.category' = 'tech'",
    limit=10
).to_list()

You can combine multiple conditions using AND, OR, and parentheses, and use any SQL operators supported by TiDB.

Warning

When using SQL string filters with dynamic user input, always validate the input to prevent SQL injection vulnerabilities.