Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No hits with search threshold 0 on documents containing words with common root #911

Open
fturmel opened this issue Mar 12, 2025 · 1 comment

Comments

@fturmel
Copy link
Contributor

fturmel commented Mar 12, 2025

Describe the bug

When doing full text search with threshold 0 on a document that contains a few words with common roots, we don't get a hit until we've typed enough characters to disambiguate them.

To Reproduce

Search with threshold 0 the following test cases:

On the indexed value "Phone, phonogram":

  • search for "p", "ph", "pho" or "phon" -> no hits (we should get a hit obviously)
  • search for "phone" or "phono" -> 1 hit (as expected)

On the indexed value "Bet, better":

  • search for "b", "be" or "bet" -> no hits (we should get a hit, it's even worst than the previous case because "bet" is actually a full word match)
  • search for "bett", "bette" or ""better" -> 1 hit (as expected)
  • search for "bet hi" -> 1 hit (searching for an additional word now gives us a hit for "bet", puzzling...)

On the indexed value "Some random sentence"

  • search for "s" -> no hits (we have 2 words that start with s, should be getting a hit)
  • search for "r" -> 1 hit
  • search for "se" or "so" -> 1 hit

Expected behavior

see previous reproduction description

Environment Info

OS: macOS 15.3.2
Node: 22.14.0
Orama: 3.1.2

Affected areas

Search

Additional context

No response

@fturmel
Copy link
Contributor Author

fturmel commented Mar 12, 2025

@micheleriva here are the unit tests to add to packages/orama/tests/threshold.test.ts. 8 out of 14 are failing at the moment.

t.test('should return results for words with same root if threshold is 0', async t => {
  // related issue: https://github.com/oramasearch/orama/issues/911

  const db = create({
    schema: {
      title: 'string'
    }
  })

  await insert(db, { title: 'Phone, phonogram' })
  await insert(db, { title: 'Bet, better' })
  await insert(db, { title: 'Some random sentence' })

  const testCases: [string, number][] = [
    ['p', 1],
    ['ph', 1],
    ['pho', 1],
    ['phone', 1],
    ['phono', 1],

    ['b', 1],
    ['be', 1],
    ['bet', 1],
    ['bett', 1],
    ['bet hi', 0], // the term "hi" is not in any document, there should be no hits with threshold 0

    ['s', 1],
    ['r', 1],
    ['se', 1],
    ['so', 1]
  ]

  t.plan(testCases.length)

  for (const [term, expectedCount] of testCases) {
    const result = await search(db, { term, threshold: 0 })
    t.same(
      result.count,
      expectedCount,
      `Search term "${term}" with threshold 0 should match ${expectedCount} record(s), but matched ${result.count}`
    )
  }
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant