Vector Search with Qdrant
Table of Contents
- Introduction
- What is a Vector?
- How Do We Measure Closeness?
- From Theory to Practice
- Step 0: Setup
- Step 1: Collecting Documents
- Step 2: Embedding the Documents
- Step 3: Initialising QdrantClient
- Step 4: Creating a Collection
- Step 5: Inserting Documents
- Step 6: Querying for Answers
- Step 7: Putting It All Together
- Conclusion
- Resources
Introduction
In course 1, we learned how to implement RAG (Retrieval-Augmented Generation) with Elasticsearch. Now, itâs time to step up a notch and explore vector searchâa powerful technique for finding relevant information quickly, even in massive datasets.
What is a Vector?
A vector is just an array of numbers. In NLP, we use models (like BERT or sentence transformers) to convert text (sentences, paragraphs, documents) into vectorsâthese are called embeddings.
flowchart LR
A["Text Input:<br/>How to bake bread?"] --> B["Embedding Model<br/>(pre-trained)"]
B --> C["Output Embedding Vector:<br/>[0.12, -0.34, 0.56, ..., 0.01]"]
These vectors capture the meaning of the text, so similar texts have similar vectors.
How Do We Measure Closeness?
Now that we know what a vector is, how do we measure if two sentences are similar? The most common ways are:
1. Cosine Similarity
Cosine similarity measures the angle between two vectors. It ranges from -1 (opposite) to 1 (identical). The higher the cosine similarity, the more similar the meaning.
Formula:
cosine_similarity(A, B) = (A ¡ B) / (||A|| * ||B||)
Where A ¡ B
is the dot product of vectors A and B, and ||A||
is the magnitude (length) of vector A.
Example: Suppose we have two vectors:
- A = [1, 2]
- B = [2, 4]
The cosine similarity is:
(1*2 + 2*4) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 4^2))
= (2 + 8) / (sqrt(5) * sqrt(20))
= 10 / (2.236 * 4.472)
â 1
So, these vectors are very similar.
2. Euclidean Distance
Euclidean distance measures the straight-line distance between two points (vectors). The smaller the distance, the more similar the vectors.
Formula:
euclidean_distance(A, B) = sqrt((a1 - b1)^2 + (a2 - b2)^2 + ... + (an - bn)^2)
Example: For A = [1, 2], B = [2, 4]:
sqrt((1-2)^2 + (2-4)^2) = sqrt(1 + 4) = sqrt(5) â 2.236
A smaller distance means more similarity.
From Theory to Practice
Now that we understand how vectors represent meaning and how we can measure their similarity, letâs see how this works in a real-world application. In the next steps, weâll build a simple vector search system using Qdrant and FastEmbed. Weâll go through setting up the environment, preparing data, generating embeddings, and running semantic searches.
Step 0: Setup
Qdrant is fully open-source, so you can run it however you like: self-hosted, on Kubernetes, or in the cloud. For this tutorial, weâll run Qdrant in a Docker container.
Running Qdrant with Docker
Just pull the image and start the container:
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrant
- The
-v
flag mounts local storage so your data persists even if you restart or delete the container. - Port 6333 is for the REST API, and 6334 is for gRPC.
- Qdrantâs built-in Web UI is available at http://localhost:6333/dashboard.
Step 1: Collecting Documents
Letâs fetch some documents to use in this tutorial:
import requests
docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()
documents = []
for course in documents_raw:
course_name = course['course']
for doc in course['documents']:
doc['course'] = course_name
documents.append(doc)
Here, weâre downloading a JSON file with course documents and flattening them into a list.
Step 2: Embedding the Documents
Weâll use FastEmbed as our embedding provider. You can list supported models like this:
from fastembed import TextEmbedding
TextEmbedding.list_supported_models()
Letâs use the model BAAI/bge-small-en
:
model_name = 'BAAI/bge-small-en'
model = TextEmbedding(model_name=model_name)
Model info:
- 384 dimensions
- English, 512 input tokens
- Small size (0.13GB)
Step 3: Initialising QdrantClient
from qdrant_client import QdrantClient, models
qd_client = QdrantClient("http://localhost:6333")
Step 4: Creating a Collection
Weâll create a collection called llm-zoom-camp
. The vector size is 384 (matching our embedding model), and weâll use cosine similarity.
qd_client.create_collection(
collection_name='llm-zoom-camp',
vectors_config=models.VectorParams(
size=384, # Embedding dimensionality of BAAI/bge-small-en
distance=models.Distance.COSINE
)
)
Step 5: Inserting Documents
Now, letâs embed our documents and insert them into Qdrant.
points = []
for i, doc in enumerate(documents):
text = doc['question'] + ' ' + doc['text']
vector = model.embed([text])[0] # Generate embedding for the text
point = models.PointStruct(
id=i,
vector=vector,
payload=doc
)
points.append(point)
Here, we concatenate the question and text, generate an embedding, and create a point for each document. After that, the generated points will be upserted into the collection, and the vector index will be built.
qd_client.upsert(
collection_name='llm-zoom-camp',
points=points
)
Step 6: Querying for Answers
Letâs try to get an answer for the question:
I just discovered the course. Can I join now?
question = "I just discovered the course. Can I join now?"
query_vector = model.embed([question])[0]
query_points = qd_client.search(
collection_name='llm-zoom-camp',
query_vector=query_vector,
limit=5,
with_payload=True
)
Example result:
{
'text': 'Yes, you can. You wonât be able to submit some of the homeworks, but you can still take part in the course.\nIn order to get a certificate, you need to submit 2 out of 3 course projects and review 3 peersâ Projects by the deadline. It means that if you join the course at the end of November and manage to work on two projects, you will still be eligible for a certificate.',
'section': 'General course-related questions',
'question': 'The course has already started. Can I still join it?',
'course': 'machine-learning-zoomcamp'
}
The model found a document with a question similar to ours (âThe course has already started. Can I still join it?â) and returned a relevant answer. This shows how vector search can retrieve semantically similar information, even if the wording is different.
Step 7: Putting It All Together
Letâs wrap this up in a function:
def vector_search(question):
print('vector_search is used')
query_vector = model.embed([question])[0]
query_points = qd_client.search(
collection_name='llm-zoom-camp',
query_vector=query_vector,
limit=5,
with_payload=True
)
results = []
for point in query_points:
results.append(point.payload)
return results
And integrate with a RAG pipeline:
def rag(query):
search_results = vector_search(query)
prompt = build_prompt(query, search_results)
answer = llm(prompt)
return answer
Example usage:
rag('I just discovered the course. Can I join now?')
Sample output:
"Yes, you can still join the course now. Even if you don't register, you're still eligible to submit the homeworks. However, keep in mind that there will be deadlines for the final projects, so it's best not to leave everything for the last minute."
Conclusion
Vector search is a game-changer for finding relevant information in large datasets. By converting text into embeddings, we can search by meaning, not just keywords. With tools like Qdrant and FastEmbed, itâs easier than ever to build powerful, semantic search systems. Whether youâre building a chatbot, a document search engine, or a RAG pipeline, vector search will help you deliver smarter, more relevant results.
Resources
Here are some useful links if you want to learn more or try things yourself: