Latest Research Papers
2024-12-26
arXiv
Jasper and Stella: distillation of SOTA embedding models
The paper introduces a distillation technique to create smaller, efficient embedding models and a method to reduce vector dimensions, along with alignment training for multimodal encoding, achieving high scores on MTEB benchmarks.
A crucial component of many deep learning applications (such as FAQ and RAG)
is dense retrieval, in which embedding models are used to convert raw text to
numerical vectors and then get the most similar text by MIPS (Maximum Inner
Product Search). Some text embedding benchmarks (e.g. MTEB, BEIR, and
AIR-Bench) have been established to evaluate embedding models accurately.
Thanks to these benchmarks, we can use SOTA models; however, the deployment and
application of these models in industry were hampered by their large vector
dimensions and numerous parameters. To alleviate this problem, 1) we present a
distillation technique that can enable a smaller student model to achieve good
performance. 2) Inspired by MRL we present a training approach of reducing the
vector dimensions based on its own vectors or its teacher vectors. 3) We do
simple yet effective alignment training between images and text to make our
model a multimodal encoder. We trained Stella and Jasper models using the
technologies above and achieved high scores on the MTEB leaderboard. We release
the model and data at Hugging Face Hub
(https://huggingface.co/infgrad/jasper_en_vision_language_v1) and the training
logs are at https://api.wandb.ai/links/dunnzhang0/z8jqoqpb.