Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hierarchical structure to the graph index #402

Merged
merged 131 commits into from
Mar 28, 2025
Merged

Add hierarchical structure to the graph index #402

merged 131 commits into from
Mar 28, 2025

Conversation

marianotepper
Copy link
Collaborator

@marianotepper marianotepper commented Mar 13, 2025

This PR introduces a hybrid HNSW-DiskANN graph. From HNSW, we take the idea of using multiple layers. This adds robustness to particularly hard datasets. Each layer is a Vamana graph. The upper layers reside in memory while the base layer resides on disk (in DiskANN style). It also enable using a single layer. In that case, it is a plain Vamana graph.

Jonathan Ellis and others added 30 commits February 7, 2025 17:21
Copy link
Collaborator

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor comments, and one tiny bug I missed in the first pass of review.

for (int i = 0; i < other.graph.getIdUpperBound(); i++) {
if (!other.graph.containsNode(i)) {
continue;
IntStream.range(0, other.graph.getIdUpperBound()).parallel().forEach(i -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, confirming the change captured my intent.

rerankedResults.setMaxSize(topK);

// add evicted results from the last call back to the candidates
var previouslyEvicted = evictedResults.size() > 0 ? new SparseBits() : Bits.NONE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use evictedResults, but not the previouslyEvicted bitset created/populated here (https://github.com/datastax/jvector/blob/main/jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java#L302). We used to use this bitset in resume (https://github.com/datastax/jvector/blob/main/jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java#L302), but we don't in the new hierarchical structure. I think we can keep evictedResults but don't need to spend the CPU to create/update previouslyEvicted (it's set but never read).

@jkni jkni self-requested a review March 27, 2025 20:53
Copy link
Collaborator

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR!

static List<Integer> sortedNodes(GraphIndex h) {
var graphNodes = h.getNodes();
static List<Integer> sortedNodes(GraphIndex h, int level) {
var graphNodes = h.getNodes(level); // TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked TODO without explanation. Is there something remaining?


// Add 3 nodes
builder.addGraphNode(0, ravv.getVector(0));
builder.addGraphNode(1, ravv.getVector(1));
builder.addGraphNode(2, ravv.getVector(2));
var neighbors = builder.graph.getNeighbors(0);
var neighbors = builder.graph.getNeighbors(0, 0); // TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO? Something missing?

@@ -78,14 +75,14 @@ public void testFusedGraph() throws Exception {
var fusedScoreFunction = cachedOnDiskView.approximateScoreFunctionFor(queryVector, similarityFunction);
var ordinal = getRandom().nextInt(graph.size());
// first pass compares fused ADC's direct similarity to reranker's similarity, used for comparisons to a specific node
var neighbors = cachedOnDiskView.getNeighborsIterator(ordinal);
var neighbors = cachedOnDiskView.getNeighborsIterator(0, ordinal); // TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO? Something left to do?

Copy link
Collaborator

@tlwillke tlwillke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some TODOs remaining but seem minor. A major contribution. Will affect public interfaces but it is the path forward, so worth it!

@tlwillke tlwillke merged commit d623b1f into main Mar 28, 2025
6 checks passed
@tlwillke tlwillke deleted the hnsw-3 branch March 28, 2025 05:40
@tlwillke tlwillke restored the hnsw-3 branch March 28, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants