SoftMatcha

Semantic search over massive text corpora

SoftMatcha logo

What is SoftMatcha?

SoftMatcha is a large-scale text search system designed for semantic and soft matching over massive corpora.

Unlike traditional keyword-based tools such as grep or exact n-gram retrieval systems like Infini-gram, SoftMatcha enables retrieval based on semantic similarity rather than exact surface forms.

SoftMatcha ICLR 2025

A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Relaxes lexical search with word embeddings while preserving inverted-index efficiency.