Relaxes lexical search with word embeddings while preserving inverted-index efficiency.
Semantic search over massive text corpora
SoftMatcha is a large-scale text search system designed for semantic and soft matching over massive corpora.
Unlike traditional keyword-based tools such as grep or
exact n-gram retrieval systems like Infini-gram,
SoftMatcha enables retrieval based on semantic similarity
rather than exact surface forms.
SoftMatcha ICLR 2025
A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
Relaxes lexical search with word embeddings while preserving inverted-index efficiency.