Question about the result for MTEB(Multilingual) #2027

afalf · 2025-02-10T14:55:57Z

afalf
Feb 10, 2025

The calculation logic of MTEB(Multilingual) seems to involve many multilingual datasets, where the main score for each dataset is the average of the results across various languages (considering only the languages that the model has been evaluated on). I am quite puzzled about how it is ensured that all the models currently on the leaderboard have been evaluated on all the languages. Are these results now comparable?

Samoed · 2025-02-10T15:02:35Z

Samoed
Feb 10, 2025
Collaborator

We're checking all splits datasets results

2 replies

KennethEnevoldsen Feb 10, 2025
Maintainer

So just to expand: If a model is run on a task for only English, but not all languages, then it will be filtered out and not included in the leaderboard.

KennethEnevoldsen Feb 10, 2025
Maintainer

If a task appears on two benchmarks, it can have different languages specified (and thus the score will be different). We could write all of these out:

MassiveIntentClassification (eng), MassiveIntent (dan), MassiveIntent...
0.80, 0.81, ...

However, currently, the average is made across tasks though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the result for MTEB(Multilingual) #2027

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Question about the result for MTEB(Multilingual) #2027

afalf Feb 10, 2025

Replies: 1 comment · 2 replies

Samoed Feb 10, 2025 Collaborator

KennethEnevoldsen Feb 10, 2025 Maintainer

KennethEnevoldsen Feb 10, 2025 Maintainer

afalf
Feb 10, 2025

Replies: 1 comment 2 replies

Samoed
Feb 10, 2025
Collaborator

KennethEnevoldsen Feb 10, 2025
Maintainer

KennethEnevoldsen Feb 10, 2025
Maintainer