question regarding clustering evaluation in MTEB #2278

OnAnd0n · 2025-03-06T11:22:36Z

OnAnd0n
Mar 6, 2025

Hello,
I am interested in evaluation on Korean text embeddings.

I have a question about the clustering task evaluation method in MTEB.

MTEB evaluates clustering task using the K-means algorithm.
However, since K-means measures distance based on Euclidean space,
I think this approach may not be suitable for text embedding models trained to measure similarity using cosine similarity or inner product.

so, I considered an alternative approach that applies spectral clustering on a graph constructed using cosine similarity.

In fact, I observed an improvement in the V-measure score with this method.

I would like to hear your thoughts on this approach.

Best regards,

import networkx as nx

class ClusteringEvaluator_REV(ClusteringEvaluator):
    def __call__(self, model: Encoder, *, encode_kwargs: dict[str, Any] = {}):
        if "batch_size" not in encode_kwargs:
            encode_kwargs["batch_size"] = 32

        corpus_embeddings = model.encode(
            self.sentences,
            task_name=self.task_name,
            **encode_kwargs,
        )
        G = nx.Graph()

        # logger.info("Fitting Faiss K-Means model...")
        for i, i_text in enumerate(self.sentences[:-1]):
          score_list = model.similarity(corpus_embeddings[i], corpus_embeddings[i+1:])[0]*100
          for j_text, score in zip(self.sentences[i+1:], score_list):
                G.add_edge(i_text, j_text, weight=score)


        adjacency_cos_score_matrix = nx.to_numpy_array(G)
        adjacency_cos_score_matrix = np.where(adjacency_cos_score_matrix < 0, 0, adjacency_cos_score_matrix)

        clustering = sklearn.cluster.SpectralClustering(n_clusters=len(set(self.labels)), affinity = 'precomputed', assign_labels='discretize')
        clustering.fit(adjacency_cos_score_matrix)
        cluster_assignment = clustering.labels_

        # logger.info("Evaluating...")
        v_measure = metrics.cluster.v_measure_score(self.labels, cluster_assignment)

        return {"v_measure": v_measure}

	k-means	cos & spectral
intfloat/multilingual-e5-large-instruct	26.62	34.42
BAAI/bge-m3	33.01	47.82
nlpai-lab/KoE5	36.22	47.95
dragonkue/BGE-m3-ko	29.49	40
upskyy/bge-m3-korean	22.08	26.14
nlpai-lab/KURE-v1	39.21	53.46
Snowflake/snowflake-arctic-embed-l-v2.0	40.68	47.53

KennethEnevoldsen · 2025-03-07T16:16:16Z

KennethEnevoldsen
Mar 7, 2025
Maintainer

Oh, this is very interesting!

It seems like ranking is maintained; generally, ranking is maintained with a few exceptions.

	k-means	cos & spectral	rank
upskyy/bge-m3-korean	22.08	26.14	7 (7)
intfloat/multilingual-e5-large-instruct	26.62	34.42	6 (6)
dragonkue/BGE-m3-ko	29.49	40	5 (5)
BAAI/bge-m3	33.01	47.82	4 (4)
nlpai-lab/KoE5	36.22	47.95	3 (2)
nlpai-lab/KURE-v1	39.21	53.46	2 (1)
Snowflake/snowflake-arctic-embed-l-v2.0	40.68	47.53	1 (3)

Though it is definitely not linear, I would love to see a Spearman correlation here.

It is also relevant to examine the outliers (KURE-v1 seems to perform really well).

We could also do a cosine version of Kmeans.

Some of the implementations already normalize the vectors before returning them.

5 replies

OnAnd0n Mar 7, 2025
Author

Many thanks for your feedback

The Spearman correlation results are as follows:

Spearman correlation: 0.7857 (except Kure : 0.8286)
P-value: 0.0362 (except Kure : 0.0416)

For k-means evaluation, normalizing the vectors might be a simple and effective approach.
Additionally, averaging the scores from cosine and spectral clustering methods could lead to a more refined evaluation.

For example:
return {"v_measure": mean of [v_measure1, v_measure2] }

KennethEnevoldsen Mar 11, 2025
Maintainer

Hmm, thanks for the values (and sorry for the late reply).

This would change backward compatibility, which is probably not something we would do.

However, we could introduce something like:

class MyClusteringTask(AbsTaskClusteringFast):
    # Allow the clustering model to be changed out without too much hassle 
    clustering_model = sklearn.cluster.MiniBatchKMeans # change it out for your model of choice
    
    metadata = ...

Where you could supply any Clustering Method that is wrapped in a Sklearn syntax.

I would be more than happy to review such a PR. This would allow us to create new versions of current tasks (potentially with more better estimates)

OnAnd0n Mar 12, 2025
Author

Thank you for your sincere advice and response.

I understood that I should add a 'clustering_model' in the Task-level Class.
it feels a bit challenging to declare 'clustering_model' at the task level.

Are there any other examples I can refer to?

I think both AbsTask-level-Class & Task-level-Class are needed.

class AbsTaskSpectralClustering(AbsTask)
     clustering_model = sklearn.cluster.spectralclustering
     
     def eval_subset(self, ~)
       ...
       
class koSIB200SpectralClustering(AbsTaskSpectralClustering)
     metadata = ...

KennethEnevoldsen Mar 12, 2025
Maintainer

Let us try creating:

class AbsTaskSpectralClustering(AbsTaskClusteringFast)
     clustering_model = sklearn.cluster.spectralclustering     
     ...

# and

class AbsTaskCosineClustering(AbsTaskClusteringFast)
     clustering_model = ...  
     ...

Then, we can see if they can be combined into one.

OnAnd0n Mar 12, 2025
Author

thank you for your response.
I will try it using example code you provided

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question regarding clustering evaluation in MTEB #2278

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

question regarding clustering evaluation in MTEB #2278

OnAnd0n Mar 6, 2025

Replies: 1 comment · 5 replies

KennethEnevoldsen Mar 7, 2025 Maintainer

OnAnd0n Mar 7, 2025 Author

KennethEnevoldsen Mar 11, 2025 Maintainer

OnAnd0n Mar 12, 2025 Author

KennethEnevoldsen Mar 12, 2025 Maintainer

OnAnd0n Mar 12, 2025 Author

OnAnd0n
Mar 6, 2025

Replies: 1 comment 5 replies

KennethEnevoldsen
Mar 7, 2025
Maintainer

OnAnd0n Mar 7, 2025
Author

KennethEnevoldsen Mar 11, 2025
Maintainer

OnAnd0n Mar 12, 2025
Author

KennethEnevoldsen Mar 12, 2025
Maintainer

OnAnd0n Mar 12, 2025
Author