Figure 1: Performance of various models on text representation embedding and generation tasks. Retrieve duplicate questions from AskUbuntu forum. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Represent the title of a user question to find a duplicate user question title with body from the Programmers StackExchange forum. This boost likely comes from our training data containing many documents beyond tokens, which need to be truncated if the maximum sequence length is
nest...