view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries 17 days ago • 5
🦢SWIM-IR Dataset [NAACL'24] Collection 29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated Mar 31, 2025 • 8
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated 29 days ago • 158