From unsupervised to supervised metrics

Embeddings, also generally known as representations, are dense vector representations of entities resembling words, documents, products, and more. They’re designed to capture semantic meanings and highlight similarities amongst entities. An excellent set of representations shouldn’t only efficiently encode the essential features of entities but additionally exhibit properties like compactness, meaningfulness, and robustness across various tasks. In this text, we glance into various evaluation metrics to evaluate the standard of representations. Let’s start.
Any evaluation framework consists of three essential components:
- A baseline method: this serves as a benchmark against which latest approaches or models are compared. It provides a reference point for evaluating the performance of the proposed methods.
- A set of evaluation metrics: evaluation metrics are quantitative measures used to guage the performance of the models. These metrics may be supervised or unsupervised, and define how the success of the outputs is assessed.
- An evaluation dataset: the evaluation dataset is a group of labeled/annotated or unlabelled data used to evaluate the performance of the models. This dataset needs to be representative of the real-world scenarios that the models are expected to handle. It must cover a various range of examples to make sure a comprehensive evaluation.
Based on if evaluation metrics require ground truth labels, we will split them into un-supervised metrics, and supervised metrics. It is commonly more advantageous to employ un-supervised metrics, as they don’t require labels, and the gathering of labels may be very expensive in practice.
Below, we are going to look into state-of-the-art metrics. For every metric, pick a baseline method to match your evaluations against. The baseline may be so simple as `random embedding generator`!
Supervised metrics require a labelled evaluation dataset. A standard strategy is to decide on a predictor resembling a classifier or regressor. Then train the predictor on a limited set of labeled data from a…