-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hierarchical structure to the graph index #402
Conversation
…where the node is present
vels to candidates pool in layer 0
… no dupes are submitted
…e in all the corresponding layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few very minor comments, and one tiny bug I missed in the first pass of review.
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/NodeQueue.java
Outdated
Show resolved
Hide resolved
jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndex.java
Outdated
Show resolved
Hide resolved
for (int i = 0; i < other.graph.getIdUpperBound(); i++) { | ||
if (!other.graph.containsNode(i)) { | ||
continue; | ||
IntStream.range(0, other.graph.getIdUpperBound()).parallel().forEach(i -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, confirming the change captured my intent.
jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndexWriter.java
Outdated
Show resolved
Hide resolved
rerankedResults.setMaxSize(topK); | ||
|
||
// add evicted results from the last call back to the candidates | ||
var previouslyEvicted = evictedResults.size() > 0 ? new SparseBits() : Bits.NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use evictedResults, but not the previouslyEvicted bitset created/populated here (https://github.com/datastax/jvector/blob/main/jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java#L302). We used to use this bitset in resume
(https://github.com/datastax/jvector/blob/main/jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java#L302), but we don't in the new hierarchical structure. I think we can keep evictedResults but don't need to spend the CPU to create/update previouslyEvicted (it's set but never read).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the PR!
static List<Integer> sortedNodes(GraphIndex h) { | ||
var graphNodes = h.getNodes(); | ||
static List<Integer> sortedNodes(GraphIndex h, int level) { | ||
var graphNodes = h.getNodes(level); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is marked TODO without explanation. Is there something remaining?
|
||
// Add 3 nodes | ||
builder.addGraphNode(0, ravv.getVector(0)); | ||
builder.addGraphNode(1, ravv.getVector(1)); | ||
builder.addGraphNode(2, ravv.getVector(2)); | ||
var neighbors = builder.graph.getNeighbors(0); | ||
var neighbors = builder.graph.getNeighbors(0, 0); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO? Something missing?
@@ -78,14 +75,14 @@ public void testFusedGraph() throws Exception { | |||
var fusedScoreFunction = cachedOnDiskView.approximateScoreFunctionFor(queryVector, similarityFunction); | |||
var ordinal = getRandom().nextInt(graph.size()); | |||
// first pass compares fused ADC's direct similarity to reranker's similarity, used for comparisons to a specific node | |||
var neighbors = cachedOnDiskView.getNeighborsIterator(ordinal); | |||
var neighbors = cachedOnDiskView.getNeighborsIterator(0, ordinal); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO? Something left to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some TODOs remaining but seem minor. A major contribution. Will affect public interfaces but it is the path forward, so worth it!
This PR introduces a hybrid HNSW-DiskANN graph. From HNSW, we take the idea of using multiple layers. This adds robustness to particularly hard datasets. Each layer is a Vamana graph. The upper layers reside in memory while the base layer resides on disk (in DiskANN style). It also enable using a single layer. In that case, it is a plain Vamana graph.