BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Why a new benchmark?

Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many real-world, complex queries necessitate in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. We introduce BRIGHT to better benchmark retrieval on such challenging and realistic scenarios.

BRIGHT

We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. We collect 1,385 real-world queries from diverse domains (StackExchange, LeetCode, and math competitions), sourced from naturally occurring or carefully curated human data. We pair these queries with web pages linked in StackExchange answers, tagged theorems in math Olympiad questions—all of which require deliberate reasoning to identify the connections.

Leaderboard submission

If you would like to submit your results to the leaderboard, email the results to suhongjin96@gamil.com! Optionally, you are encouraged to provide the link to the open-sourced codebase. Otherwise, you may provide a short description on the used models and approaches (e.g., size of retrieval model, whether LLMs like GPT-4 or reranking are used, etc.)!

Have Questions?

Ask us questions at our Github issues page or contact Hongjin Su, Howard Yen, or Mengzhou Xia.

Rank ▲	Model ▲	Model Size (Million Parameters) ▲	Memory Usage (GB, fp32) ▲	Embedding Dimensions ▲	Max Tokens ▲	Average (35 datasets) ▲	Classification Average (9 datasets) ▲	Clustering Average (4 datasets) ▲	PairClassification Average (2 datasets) ▲	Reranking Average (4 datasets) ▲

Leaderboard

Short document
Long document

We report the average nDCG@10 score across 12 datasets in BRIGHT. Apart from using the original query, retrievers can replace it with the LLM reasoning steps to retrieve relevant documents.

Rank	Retriever	Score
1 Aug 28, 2024	BM25, with GPT-4 reasoning and top-100 reranking by Llama-3.1-70B Salesforce Research (proprietary code)	30.4
2 July 11, 2024	BM25, with gpt-4-0125-preview reasoning Microsoft	26.5
3 July 11, 2024	BM25, with Claude-3-Opus reasoning Microsoft	26.3
4 July 11, 2024	instructor-xl, with gpt-4-0125-preview reasoning The University of Hong Kong, University of Washington	26.2
5 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with gpt-4-0125-preview reasoning Google	25.8
6 July 11, 2024	instructor-xl, with Llama-3-70B-Instruct reasoning The University of Hong Kong, University of Washington	25.8
7 July 11, 2024	instructor-xl, with Claude-3-Opus reasoning The University of Hong Kong, University of Washington	25.8
8 July 11, 2024	BM25, with Llama-3-70B-Instruct reasoning Microsoft	25.3
9 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Claude-3-Opus reasoning Google	25.0
10 July 11, 2024	gte-Qwen1.5-7B-instruct, with gpt-4-0125-preview reasoning Alibaba	24.5
11 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Llama-3-70B-Instruct reasoning Google	24.5
12 July 11, 2024	gte-Qwen1.5-7B-instruct, with Claude-3-Opus reasoning Alibaba	24.5
13 July 11, 2024	voyage-large-2-instruct, with gpt-4-0125-preview reasoning Voyage AI	24.4
14 July 11, 2024	GritLM-7B, with gpt-4-0125-preview reasoning ContextualAI, The University of Hong Kong, Microsoft	24.0
15 July 11, 2024	instructor-xl, with Gemini-1.0-pro reasoning The University of Hong Kong, University of Washington	24.0
16 July 11, 2024	BM25, with Gemini-1.0-pro reasoning Microsoft	23.5
17 July 11, 2024	text-embedding-3-large, with gpt-4-0125-preview reasoning OpenAI	23.1
18 July 11, 2024	gte-Qwen1.5-7B-instruct, with Llama-3-70B-Instruct reasoning Alibaba	23.1
19 July 11, 2024	instructor-large, with gpt-4-0125-preview reasoning The University of Hong Kong, University of Washington	22.9
20 July 11, 2024	voyage-large-2-instruct, with Llama-3-70B-Instruct reasoning Voyage AI	22.8
21 July 11, 2024	GritLM-7B, with Claude-3-Opus reasoning ContextualAI, The University of Hong Kong, Microsoft	22.8
22 July 11, 2024	voyage-large-2-instruct, with Claude-3-Opus reasoning Voyage AI	22.8
23 July 11, 2024	text-embedding-3-large, with Claude-3-Opus reasoning OpenAI	22.6
24 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-100 reranking by gpt-4-0125-preview Google	22.6
25 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Gemini-1.0-pro reasoning Google	22.5
26 July 11, 2024	Cohere-embed-english-v3.0, with gpt-4-0125-preview reasoning Cohere	22.3
27 July 11, 2024	instructor-large, with Llama-3-70B-Instruct reasoning The University of Hong Kong, University of Washington	22.3
28 July 11, 2024	gte-Qwen1.5-7B-instruct, with Gemini-1.0-pro reasoning Alibaba	22.3
29 July 11, 2024	gte-Qwen1.5-7B-instruct Alibaba	22.1
30 July 11, 2024	instructor-xl, with GritLM-7B reasoning The University of Hong Kong, University of Washington	22.1
31 July 11, 2024	voyage-large-2-instruct, with Gemini-1.0-pro reasoning Voyage AI	22.1
32 July 11, 2024	text-embedding-3-large, with Llama-3-70B-Instruct reasoning OpenAI	22.0
33 July 11, 2024	Cohere-embed-english-v3.0, with Llama-3-70B-Instruct reasoning Cohere	21.9
34 July 11, 2024	e5-mistral-7b-instruct, with gpt-4-0125-preview reasoning Microsoft	21.8
35 July 11, 2024	SFR-Embedding-Mistral, with gpt-4-0125-preview reasoning Salesforce	21.7
36 July 11, 2024	bge-large-en-v1.5, with gpt-4-0125-preview reasoning Beijing Academy of Artificial Intelligence	21.6
37 July 11, 2024	instructor-large, with Claude-3-Opus reasoning The University of Hong Kong, University of Washington	21.6
38 July 11, 2024	SFR-Embedding-Mistral, with Claude-3-Opus reasoning Salesforce	21.5
39 July 11, 2024	Cohere-embed-english-v3.0, with Claude-3-Opus reasoning Cohere	21.5
40 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by gpt-4-0125-preview Google	21.5
41 July 11, 2024	text-embedding-3-large, with Gemini-1.0-pro reasoning OpenAI	21.2
42 July 11, 2024	e5-mistral-7b-instruct, with Claude-3-Opus reasoning Microsoft	21.1
43 July 11, 2024	bge-large-en-v1.5, with Claude-3-Opus reasoning Beijing Academy of Artificial Intelligence	20.7
44 July 11, 2024	GritLM-7B ContextualAI, The University of Hong Kong, Microsoft	20.6
45 July 11, 2024	GritLM-7B, with Llama-3-70B-Instruct reasoning ContextualAI, The University of Hong Kong, Microsoft	20.5
46 July 11, 2024	GritLM-7B, with Gemini-1.0-pro reasoning ContextualAI, The University of Hong Kong, Microsoft	20.5
47 July 11, 2024	instructor-large, with Gemini-1.0-pro reasoning The University of Hong Kong, University of Washington	20.4
48 July 11, 2024	bge-large-en-v1.5, with Llama-3-70B-Instruct reasoning Beijing Academy of Artificial Intelligence	20.3
49 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by Gemini-1.0-pro Google	20.1
50 July 11, 2024	SFR-Embedding-Mistral, with Gemini-1.0-pro reasoning Salesforce	19.9
51 July 11, 2024	SFR-Embedding-Mistral, with Llama-3-70B-Instruct reasoning Salesforce	19.7
52 July 11, 2024	gte-Qwen1.5-7B-instruct, with GritLM-7B reasoning Alibaba	19.7
53 July 11, 2024	e5-mistral-7b-instruct, with Llama-3-70B-Instruct reasoning Microsoft	19.6
54 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768 Google	19.5
55 July 11, 2024	Cohere-embed-english-v3.0, with Gemini-1.0-pro reasoning Cohere	19.5
56 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with GritLM-7B reasoning Google	19.3
57 July 11, 2024	e5-mistral-7b-instruct, with Gemini-1.0-pro reasoning Microsoft	19.3
58 July 11, 2024	BM25, with GritLM-7B reasoning Microsoft	19.1
59 July 11, 2024	instructor-xl The University of Hong Kong, University of Washington	18.6
60 July 11, 2024	voyage-large-2-instruct, with GritLM-7B reasoning Voyage AI	18.5
61 July 11, 2024	bge-large-en-v1.5, with Gemini-1.0-pro reasoning Beijing Academy of Artificial Intelligence	18.4
62 July 11, 2024	GritLM-7B, with GritLM-7B reasoning ContextualAI, The University of Hong Kong, Microsoft	18.1
63 July 11, 2024	SFR-Embedding-Mistral Salesforce	18.0
64 July 11, 2024	text-embedding-3-large, with GritLM-7B reasoning OpenAI	17.8
65 July 11, 2024	text-embedding-3-large OpenAI	17.6
66 July 11, 2024	voyage-large-2-instruct Voyage AI	17.6
67 July 11, 2024	e5-mistral-7b-instruct Microsoft	17.5
68 July 11, 2024	sentence-transformers, with gpt-4-0125-preview reasoning Technische Universität Darmstadt	17.5
69 July 11, 2024	e5-mistral-7b-instruct, with GritLM-7B reasoning Microsoft	17.5
70 July 11, 2024	BM25, top-10 reranking by gpt-4-0125-preview Microsoft	17.4
71 July 11, 2024	SFR-Embedding-Mistral, with GritLM-7B reasoning Salesforce	17.2
72 July 11, 2024	BM25, top-100 reranking by gpt-4-0125-preview Microsoft	17.0
73 July 11, 2024	Cohere-embed-english-v3.0 Cohere	16.3
74 July 11, 2024	sentence-transformers, with Llama-3-70B-Instruct reasoning Technische Universität Darmstadt	16.1
75 July 11, 2024	sentence-transformers, with Claude-3-Opus reasoning Technische Universität Darmstadt	16.1
76 July 11, 2024	Cohere-embed-english-v3.0, with GritLM-7B reasoning Cohere	16.0
77 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by MiniLM Google	16.0
78 July 11, 2024	bge-large-en-v1.5, with GritLM-7B reasoning Beijing Academy of Artificial Intelligence	15.7
79 July 11, 2024	instructor-large, with GritLM-7B reasoning The University of Hong Kong, University of Washington	15.7
80 July 11, 2024	BM25, top-10 reranking by Gemini-1.0-pro Microsoft	15.7
81 July 11, 2024	sentence-transformers, with Gemini-1.0-pro reasoning Technische Universität Darmstadt	15.3
82 July 11, 2024	sentence-transformers Technische Universität Darmstadt	14.6
83 July 11, 2024	BM25 Microsoft	14.3
84 July 11, 2024	instructor-large The University of Hong Kong, University of Washington	14.0
85 July 11, 2024	sentence-transformers, with GritLM-7B reasoning Technische Universität Darmstadt	13.7
86 July 11, 2024	bge-large-en-v1.5 Beijing Academy of Artificial Intelligence	13.6
87 July 11, 2024	BM25, top-10 reranking by MiniLM Microsoft	13.1
88 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-100 reranking by MiniLM Google	9.2
89 July 11, 2024	BM25, top-100 reranking by MiniLM Microsoft	8.3

Rank	Retriever	Score
1 July 11, 2024	gte-Qwen1.5-7B-instruct Alibaba	27.8
2 July 11, 2024	SFR-Embedding-Mistral Salesforce	26.0
3 July 11, 2024	GritLM-7B ContextualAI, The University of Hong Kong, Microsoft	26.0
4 July 11, 2024	e5-mistral-7b-instruct Microsoft	25.5
5 July 11, 2024	voyage-large-2-instruct Voyage AI	24.6
6 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768 Google	22.4
7 July 11, 2024	text-embedding-3-large OpenAI	21.9
8 July 11, 2024	Cohere-embed-english-v3.0 Cohere	18.4
9 July 11, 2024	instructor-large The University of Hong Kong, University of Washington	18.2
10 July 11, 2024	instructor-xl The University of Hong Kong, University of Washington	17.8
11 July 11, 2024	sentence-transformers Technische Universität Darmstadt	17.4
12 July 11, 2024	bge-large-en-v1.5 Beijing Academy of Artificial Intelligence	14.8
13 July 11, 2024	BM25 Microsoft	11.4