Auto Embedding
Auto embedding is a feature that allows you to automatically generate vector embeddings for text data.
Tip
To check the complete example code, please refer to the auto embedding example.
Basic Usage
In this example, we use TiDB Cloud hosted embedding models for demonstration, for other providers, please check the Supported Providers list.
Step 1. Define a embedding function
Step 2. Create a table and a vector field
Use embed_func.VectorField()
to create a vector field on the table.
To enable auto embedding, you need to set source_field
to the field that you want to embed.
from pytidb.schema import TableModel, Field
from pytidb.datatype import TEXT
class Chunk(TableModel):
id: int = Field(primary_key=True)
text: str = Field(sa_type=TEXT)
text_vec: list[float] = embed_func.VectorField(source_field="text")
table = client.create_table(schema=Chunk, if_exists="overwrite")
You don't need to specify the dimensions
parameter, it will be automatically determined by the embedding model.
However, you can specify the dimensions
parameter to override the default dimension.
Step 3. Insert some sample data
Insert some sample data into the table.
table.bulk_insert([
Chunk(text="TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads."),
Chunk(text="PyTiDB is a Python library for developers to connect to TiDB."),
Chunk(text="LlamaIndex is a Python library for building AI-powered applications."),
])
When inserting data, the text_vec
field will be automatically populated with the vector embeddings generated based on the text
field.