Filtering
As a relational database, TiDB supports a rich set of SQL operators and allows flexible combinations of filtering conditions that enable you to query your data precisely.
Overview
You can not only apply filtering on scalar fields but also on JSON fields. Filtering on JSON fields is often used for metadata filtering in vector search.
For PyTiDB, you can apply filtering by passing a filters parameter to the table.query()
, table.delete()
, table.update()
, and table.search()
methods.
The filters parameter supports two formats: Dictionary Filters and SQL String Filters.
Dictionary Filters
PyTiDB allows you to define filter conditions using a Python dictionary with operators as the filters parameter.
The dictionary structure of filters is as follows:
<key>
: The key can be a column name, a JSON path expression to access a JSON field (see Metadata filtering), or a logical operator.<operator>
: The operator can be a compare operator or an inclusion operator.<value>
: The value can be a scalar value, an array, it depends on the operator.
Example: Filter records where created_at
is greater than 2024-01-01
table.query({
# The `created_at` is a scalar field with DATETIME type
"created_at": {
"$gt": "2024-01-01"
}
})
Example: Filter records where meta.category
is in the array ["tech", "science"]
results = (
table.search("some query", search_type="vector")
.filter({
# The `meta` is a JSON field, and its value is a JSON object like {"category": "tech"}
"meta.category": {
"$in": ["tech", "science"]
}
})
.limit(10)
.to_list()
)
Compare operators
You can use the following compare operators to filter records:
Operator | Description |
---|---|
$eq |
Equal to value |
$ne |
Not equal to value |
$gt |
Greater than value |
$gte |
Greater than or equal to value |
$lt |
Less than value |
$lte |
Less than or equal to value |
Example: filter records where user_id
is equal to 1
You can omit the $eq
operator. The following query is equivalent to the above:
Inclusion operators
You can use the following inclusion operators to filter records:
Operator | Description |
---|---|
$in |
In array (string, int, or float) |
$nin |
Not in array (string, int, float) |
Example: Filter records where category
is in the array ["tech", "science"]
Logical operators
You can use the logical operators $and
and $or
to combine multiple filters.
Operator | Description |
---|---|
$and |
Returns results that match all filters in the list |
$or |
Returns results that match any filter in the list |
Syntax for using $and
or $or
:
{
"$and|$or": [
{
"field_name": {
<operator>: <value>
}
},
{
"field_name": {
<operator>: <value>
}
}
...
]
}
Example: using $and
to combine multiple filters:
{
"$and": [
{
"created_at": {
"$gt": "2024-01-01"
}
},
{
"meta.category": {
"$in": ["tech", "science"]
}
}
]
}
SQL String Filters
You can also use a SQL string as the filters
parameter. The string should be a valid SQL WHERE
clause (without the WHERE
keyword) using TiDB's SQL syntax.
Example: Filter records where created_at
is greater than 2024-01-01
Example: Filter records where the JSON field meta.category
equals 'tech'
You can combine multiple conditions using AND
, OR
, and parentheses, and use any SQL operators supported by TiDB.
Warning
When using SQL string filters with dynamic user input, always validate the input to prevent SQL injection vulnerabilities.