Introduction to USearch: Core Concepts & Installation
Welcome to Chapter 2! In the previous chapter, we explored the fascinating world of vector embeddings and how they allow us to represent complex data like text or images as numerical vectors. Now, it’s time to learn how to efficiently search through these vectors to find similar items. This is where USearch comes in!
This chapter will be your friendly guide to USearch, an incredibly fast and lightweight library for Approximate Nearest Neighbor (ANN) search. We’ll demystify its core concepts, walk through the straightforward installation process, and get our hands dirty with our very first vector search using Python. By the end, you’ll have a solid foundation for using USearch, paving the way for its powerful integration with ScyllaDB. Ready to dive in? Let’s go!
Prerequisites
Before we jump in, make sure you have:
- A basic understanding of vector embeddings (covered in Chapter 1).
- Python 3.8+ installed on your system.
pip, Python’s package installer, updated to its latest version.
Core Concepts of USearch
Imagine you have millions of books, and you want to find all books “similar” in content to the one you just read. Reading every book would take forever! Vector search, powered by libraries like USearch, helps you do this almost instantly.
What is USearch?
USearch is an extremely fast and memory-efficient open-source library developed by Unum for performing Approximate Nearest Neighbor (ANN) search. It’s built on a highly optimized C++ core, with bindings available for various languages, including Python, Rust, and Java. Its primary goal is to find vectors that are “closest” to a given query vector in a high-dimensional space.
Why “Approximate”? Great question! When dealing with millions or even billions of vectors, finding the absolute closest vector (Exact Nearest Neighbor) can be computationally very expensive and slow. ANN algorithms, like those in USearch, sacrifice a tiny bit of accuracy for massive gains in speed and scalability. This means USearch finds vectors that are very likely the closest, often indistinguishable from the exact closest in practical applications, but much, much faster.
The Magic Behind the Speed: HNSW
USearch primarily leverages an algorithm called Hierarchical Navigable Small Worlds (HNSW). Don’t worry about memorizing the name, but here’s the core idea:
Think of HNSW as building a multi-layered graph where each vector is a “node.”
- Bottom Layer: Contains all vectors, connected to their immediate neighbors.
- Upper Layers: Contain fewer vectors, acting as “expressways” to quickly jump across the graph.
When you perform a search, USearch starts at an upper layer, quickly navigating to the general area where your query vector’s neighbors might be. Then, it drops down to lower layers, refining the search until it finds the closest approximate neighbors. This hierarchical structure allows for incredibly fast lookups, even in massive datasets.
Let’s visualize this process at a high level:
Key Features of USearch
- Blazing Fast: Designed for high-performance, low-latency search.
- Memory Efficient: Optimized to handle large datasets within reasonable memory constraints.
- Scalable: Can handle millions to billions of vectors.
- Flexible: Supports various distance metrics (like Cosine, Euclidean, etc.) and data types.
- Embeddable: Its lightweight nature makes it easy to integrate into applications.
USearch and ScyllaDB
You might be wondering, “How does this relate to ScyllaDB?” ScyllaDB, a real-time big data database, has integrated vector search capabilities directly into its core, leveraging libraries like USearch under the hood. This means you can store your vectors in ScyllaDB and perform lightning-fast similarity searches within the database itself, combining the power of a NoSQL database with cutting-edge vector search. We’ll explore this integration in depth in later chapters!
Step-by-Step Installation
Let’s get USearch installed on your system. We’ll be using its Python bindings for our examples.
Step 1: Prepare Your Python Environment
It’s always a good practice to use a virtual environment to manage your project’s dependencies.
Open your terminal or command prompt:
# Create a new virtual environment (if you don't have one)
python3 -m venv usearch_env
# Activate the virtual environment
# On macOS/Linux:
source usearch_env/bin/activate
# On Windows:
# usearch_env\Scripts\activate
You should see (usearch_env) at the beginning of your prompt, indicating the virtual environment is active.
Step 2: Install USearch
Now, let’s install the usearch Python package. As of early 2026, USearch is under active development, with recent versions like 6.25.0 (from internal build logs) indicating continuous improvements. We’ll install the latest stable release available via pip.
pip install usearch
This command downloads and installs the usearch package and its dependencies. It will compile the C++ core if a pre-built wheel isn’t available for your system, which might take a moment.
Step 3: Verify Installation
To ensure everything is installed correctly, let’s open a Python interpreter and try importing the library.
python
Once in the Python prompt (>>>), type:
import usearch
print(usearch.__version__)
You should see an output similar to 6.25.0 (or whatever the latest stable version pip installed). If you see an error, double-check your installation steps and internet connection.
Type exit() to leave the Python interpreter.
First Steps with USearch: Your First Vector Search
Now that USearch is installed, let’s write a small Python script to perform our very first vector search! We’ll create a simple index, add some dummy vectors, and then query it.
Create a new Python file named first_search.py.
Step 1: Import USearch
At the top of your first_search.py file, we’ll import the necessary components.
# first_search.py
import numpy as np
import usearch
numpyis a standard Python library for numerical operations, which is excellent for handling vectors.usearchis, of course, our vector search library!
Step 2: Define Our Vectors
Let’s create some simple 3-dimensional vectors. In a real-world scenario, these would be generated by an embedding model.
# first_search.py
# ... (previous imports) ...
# Our sample vectors
vectors = np.array([
[1.0, 1.0, 1.0], # Vector 0 (ID 0)
[2.0, 2.0, 2.0], # Vector 1 (ID 1)
[0.9, 0.8, 1.1], # Vector 2 (ID 2) - very similar to Vector 0
[5.0, 5.0, 5.0], # Vector 3 (ID 3)
[0.1, 0.2, 0.3] # Vector 4 (ID 4)
], dtype=np.float32) # USearch often prefers float32 for performance
- We’re using
np.arrayto create a NumPy array of our vectors. dtype=np.float32is specified becausefloat32is commonly used for embeddings and is often more memory-efficient and faster for USearch.
Step 3: Create a USearch Index
Now, let’s initialize our USearch index. We need to tell it the dimensionality of our vectors and the distance metric to use.
# first_search.py
# ... (previous code) ...
# Define the dimensionality of our vectors
dimensions = vectors.shape[1] # This will be 3 for our example
# Create a USearch index
# We specify the dimensions and the distance metric (e.g., 'cosine' or 'l2_squared' for Euclidean)
index = usearch.Index(
ndim=dimensions,
metric=usearch.MetricKind.Cos, # Cosine similarity is common for embeddings
dtype=vectors.dtype # Ensure index uses the same data type as our vectors
)
print(f"USearch index created with {dimensions} dimensions using Cosine similarity.")
ndim: This crucial parameter tells USearch the number of features (dimensions) in each of your vectors. It must match your data.metric: Specifies how similarity is calculated.usearch.MetricKind.Cos: Cosine similarity, popular for text embeddings, measures the angle between vectors.usearch.MetricKind.L2sq: Squared Euclidean distance, measures the straight-line distance. Smaller values mean more similar.
dtype: Matches the data type of your vectors, usuallynp.float32.
Step 4: Add Vectors to the Index
We’ll add our vectors to the index. Each vector needs a unique integer ID.
# first_search.py
# ... (previous code) ...
# Add vectors to the index
# We'll use their array indices as their unique IDs
for i, vec in enumerate(vectors):
index.add(label=i, vector=vec)
print(f"Added vector with ID {i} to the index.")
print(f"Index now contains {len(index)} vectors.")
index.add(label, vector): This method inserts a vector into the index.label: A unique integer identifier for the vector. This is how you’ll retrieve the original data associated with the vector later.vector: The actual NumPy array representing your vector.
Step 5: Perform a Search
Now for the exciting part: querying the index to find similar vectors!
# first_search.py
# ... (previous code) ...
# Define a query vector
# Let's try to find vectors similar to Vector 0 (ID 0)
query_vector = np.array([1.0, 1.0, 0.9], dtype=np.float32) # Slightly different from Vector 0
# Perform a search
# We want the top 2 most similar neighbors
matches = index.search(query=query_vector, count=2)
print(f"\nSearching for top 2 neighbors of query vector: {query_vector}")
print("Found matches:")
for i, (label, distance) in enumerate(zip(matches.labels, matches.distances)):
# For Cosine similarity, a distance of 0 means identical, 2 means opposite.
# We often convert distance to similarity (1 - distance/2 for Cosine) for easier interpretation.
similarity = 1 - (distance / 2) # Adjust for USearch's Cosine distance range [0, 2]
print(f" Match {i+1}: ID={label}, Distance={distance:.4f}, Similarity={similarity:.4f}, Original Vector: {vectors[label]}")
index.search(query, count): This is the core search method.query: The vector you want to find similar items to.count: The number of nearest neighbors you want to retrieve.
- The
matchesobject returned containslabels(the IDs of the matched vectors) anddistances(how “far” they are from the query vector). - Important Note on Cosine Distance: USearch’s
MetricKind.Coscalculates a “distance” ranging from 0 (identical vectors) to 2 (opposite vectors). A traditional cosine similarity ranges from 1 (identical) to -1 (opposite). The conversion1 - (distance / 2)maps USearch’s distance to a more intuitive similarity score of 1 to -1.
Step 6: Run Your Script
Save first_search.py and run it from your terminal:
python first_search.py
You should see output similar to this:
USearch index created with 3 dimensions using Cosine similarity.
Added vector with ID 0 to the index.
Added vector with ID 1 to the index.
Added vector with ID 2 to the index.
Added vector with ID 3 to the index.
Added vector with ID 4 to the index.
Index now contains 5 vectors.
Searching for top 2 neighbors of query vector: [1. 1. 0.9]
Found matches:
Match 1: ID=0, Distance=0.0000, Similarity=1.0000, Original Vector: [1. 1. 1.]
Match 2: ID=2, Distance=0.0003, Similarity=0.9998, Original Vector: [0.9 0.8 1.1]
As expected, our query vector [1.0, 1.0, 0.9] is most similar to Vector 0 ([1.0, 1.0, 1.0]) and then Vector 2 ([0.9, 0.8, 1.1]), which makes perfect sense!
Mini-Challenge: Explore More Neighbors!
You’ve successfully performed your first vector search! Now, let’s try a small modification to solidify your understanding.
Challenge:
Modify the first_search.py script to:
- Change the
query_vectorto one that is very different from our current vectors, for example,[10.0, 10.0, 10.0]. - Increase the
countparameter in theindex.search()call to3. - Observe the results. Do the top 3 matches make sense given your new query vector?
Hint: Think about how the new query_vector relates to the existing vectors array. Which existing vector is it “most similar” to, even if not perfectly identical?
What to Observe/Learn:
- How changing the query vector affects the search results.
- How the
countparameter determines the number of neighbors returned. - The relationship between distance and similarity scores.
Common Pitfalls & Troubleshooting
Even with simple examples, it’s easy to stumble. Here are a few common issues and how to tackle them:
Dimension Mismatch:
- Pitfall: Creating an index with
ndim=3but then trying to add a 4-dimensional vector or query with a 2-dimensional vector. - Troubleshooting: Always ensure
index.ndimexactly matches the dimensionality of all vectors you add and query. USearch will raise an error if dimensions don’t match.
- Pitfall: Creating an index with
Incorrect
dtype:- Pitfall: Passing Python lists or NumPy arrays with
dtype=np.float64(double precision) when the index was created withdtype=np.float32. - Troubleshooting: Explicitly set
dtype=np.float32for your vectors and ensureindexis initialized with the samedtype. Whilefloat64usually works,float32is generally preferred for performance and memory in vector search.
- Pitfall: Passing Python lists or NumPy arrays with
No Results or Unexpected Results:
- Pitfall: The index is empty, or the
metricchosen doesn’t suit your data. - Troubleshooting:
- Check
len(index)to ensure vectors were added. - Verify the
metric(e.g.,Cosinefor text embeddings,L2sqfor geometric distance) is appropriate for your use case. - For
Cosinedistance, remember the1 - (distance / 2)conversion for intuitive similarity scores.
- Check
- Pitfall: The index is empty, or the
Persistence (Forgetting to Save/Load):
- Pitfall: You build a large index, close your program, and then realize the index is gone.
- Troubleshooting: USearch indexes are in-memory by default. To persist them, you need to explicitly
index.save('path/to/index.usearch')andindex.load('path/to/index.usearch'). We’ll cover this in more detail in a future chapter, but it’s good to be aware of now!
Summary
Phew! You’ve just taken a significant step into the world of vector search. Here’s a quick recap of what we covered:
- USearch Fundamentals: Learned that USearch is a lightning-fast, memory-efficient open-source library for Approximate Nearest Neighbor (ANN) search.
- HNSW Algorithm: Got a high-level understanding of how USearch uses hierarchical graphs to achieve its incredible speed.
- Installation: Successfully installed the
usearchPython package. - First Search: Wrote and executed your first Python script to create a USearch index, add vectors, and perform a similarity search.
- Core Parameters: Understood the importance of
ndim,metric(especiallyCosineandL2sq), anddtype. - Troubleshooting: Identified common pitfalls like dimension mismatches and
dtypeissues.
You’re now equipped with the foundational knowledge and practical skills to start experimenting with USearch! In the next chapter, we’ll delve deeper into advanced indexing techniques and explore how to handle larger datasets more efficiently.
References
- USearch GitHub Repository - The official source for USearch.
- ScyllaDB Vector Search Press Release - Announcing general availability of ScyllaDB Vector Search (Jan 2026).
- ScyllaDB Documentation on Vector Search - Official documentation for ScyllaDB’s vector search feature.
- NumPy Documentation - For understanding
numpy.arrayand data types.
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.