What is SoftMatcha?

SoftMatcha is a large-scale text search system designed for soft (i.e., semantic) matching matching over massive corpora.

Unlike traditional keyword-based tools such as grep or exact n-gram retrieval systems like infini-gram, SoftMatcha enables retrieval based on semantic similarity rather than exact surface forms.

SoftMatcha 2 Latest Preprint

A Fast and Soft Pattern Matcher for Trillion-Scale Corpora

Achieves fast semantic searches on trillion-scale corpora using a suffix array and corpus-aware pruning with new support for insertions and deletions in query patterns.

See More

SoftMatcha ICLR 2025

A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Relaxes lexical searches with word embeddings while preserving inverted-index efficiency.

See More