Quickstart
In this guide, you will learn how to get started with vector search in TiDB using Python SDK. Follow along to build your first AI application working with TiDB.
Prerequisites
- Go tidbcloud.com to create a TiDB Cloud Serverless cluster for free or using tiup playground to a TiDB Self-Managed cluster for local testing.
Installation
pytidb is the official Python SDK for TiDB, designed to help developers build AI applications efficiently.
To install the Python SDK, run the following command:
To use built-in embedding function, install the models
extension (alternative):
Connect to database
You can get these connection parameters from the TiDB Cloud console:
- Navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
- Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
For example, if the connection parameters are displayed as follows:
HOST: gateway01.us-east-1.prod.shared.aws.tidbcloud.com
PORT: 4000
USERNAME: 4EfqPF23YKBxaQb.root
PASSWORD: abcd1234
DATABASE: test
CA: /etc/ssl/cert.pem
The corresponding Python code to connect to the TiDB Cloud Serverless cluster would be as follows:
from pytidb import TiDBClient
db = TiDBClient.connect(
host="gateway01.us-east-1.prod.shared.aws.tidbcloud.com",
port=4000,
username="4EfqPF23YKBxaQb.root",
password="abcd1234",
database="test",
)
Note: The preceding example is for demonstration purposes only. You need to fill in the parameters with your own values and keep them secure.
Once connected, you can use the db
object to operate tables, query data, and more.
Create an embedding function
When working with embedding models, you can leverage the embedding function to automatically vectorize your data at both insertion and query stages. It natively supports popular embedding models like OpenAI, Jina AI, Hugging Face, Sentence Transformers, and others.
Go OpenAI platform to create your API key for embedding.
Go Jina AI to create your API key for embedding.
Create a table
As an example, create a table named chunks
with the following columns:
id
(int): the ID of the chunk.text
(text): the text content of the chunk.text_vec
(vector): the vector embeddings of the text.user_id
(int): the ID of the user who created the chunk.
Once created, you can use the table
object to insert data, search data, and more.
Insert Data
Now let's add some sample data to our table.
table.bulk_insert([
# 👇 The text will be automatically embedded and populated into the `text_vec` field.
Chunk(text="PyTiDB is a Python library for developers to connect to TiDB.", user_id=2),
Chunk(text="LlamaIndex is a framework for building AI applications.", user_id=2),
Chunk(text="OpenAI is a company and platform that provides AI models service and tools.", user_id=3),
])
Search for nearest neighbors
To search for nearest neighbors of a given query, you can use the table.search()
method, it will perform a vector search by default.
table.search(
# 👇 Pass the query text directly, it will be embedded to a query vector automatically.
"A library for my artificial intelligence software"
)
.limit(3).to_list()
In this example, vector search compares the query vector with the stored vectors in the text_vec
field of the chunks
table and returns the top 3 most semantically relevant results based on similarity scores.
The closer _distance
means the more similar the two vectors are.
[
{
'id': 2,
'text': 'LlamaIndex is a framework for building AI applications.',
'text_vec': [...],
'user_id': 2,
'_distance': 0.5719928358786761,
'_score': 0.4280071641213239
},
{
'id': 3,
'text': 'OpenAI is a company and platform that provides AI models service and tools.',
'text_vec': [...],
'user_id': 3,
'_distance': 0.603133726213383,
'_score': 0.396866273786617
},
{
'id': 1,
'text': 'PyTiDB is a Python library for developers to connect to TiDB.',
'text_vec': [...],
'user_id': 2,
'_distance': 0.6202191842385758,
'_score': 0.3797808157614242
}
]
Delete data
To delete a specific row from the table, you can use the table.delete()
method:
Drop table
When you no longer need a table, you can drop it using the db.drop_table()
method:
Next steps
- Learn more details about Vector Search, Fulltext Search and Hybrid Search in TiDB.