Replies: 1 comment 5 replies
-
Oh, this is very interesting! It seems like ranking is maintained; generally, ranking is maintained with a few exceptions.
Though it is definitely not linear, I would love to see a Spearman correlation here. It is also relevant to examine the outliers (KURE-v1 seems to perform really well). We could also do a cosine version of Kmeans. Some of the implementations already normalize the vectors before returning them. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am interested in evaluation on Korean text embeddings.
I have a question about the clustering task evaluation method in MTEB.
MTEB evaluates clustering task using the K-means algorithm.
However, since K-means measures distance based on Euclidean space,
I think this approach may not be suitable for text embedding models trained to measure similarity using cosine similarity or inner product.
so, I considered an alternative approach that applies spectral clustering on a graph constructed using cosine similarity.
In fact, I observed an improvement in the V-measure score with this method.
I would like to hear your thoughts on this approach.
Best regards,
Beta Was this translation helpful? Give feedback.
All reactions