Reka Flash: Our Fast and Capable Multimodal Language Model

Introduction

Today, we are excited to release the first version of our multimodal assistant Yasa-1. It is a language assistant with visual and auditory sensors that can take actions via code execution.

We trained Yasa-1 from scratch, including pretraining the base models from ground zero, aligning the multimodal models, as well as heavily optimizing both our training and serving infrastructure.

In addition to being multimodal, Yasa-1 comes with a myriad of exciting features including long context document processing, fast natively-optimized retrieval augmented generation, multilingual support (20 languages), search engine interface, and code interpreter.

Yasa-1 is currently in private preview. It is available both via our APIs and as docker containers for an on-premise or virtual private cloud deployment (see the documentation here). We are committed to deploy Yasa safely and responsibly. We will be opening up access to more enterprise partners in the coming weeks. Please reach out to us at contact@reka.ai for a demo of Yasa-1.

Capabilities and Features of Yasa-1

Multimodal Understanding

Search and Retrieval

Supporting fresh content with live search

Yasa-1 has a convenient search engine flag that connects it to the web. Turning it on allows us to tap into various commercial search engine offerings. This enables the model to use fresh information without any cutoff date limitations.

Retrieval augmented generation

Yasa can be taught to understand private datasets. Our API and on-premise deployment setup allows seamless integration of internal datasets of any modality type. 

We handle the constructions of embedding services and vector databases, as well as the adaptation process to the private datasets, to allow users to focus on building amazing experiences.

Being an end-to-end model provider allows us to train Yasa-1 to accurately use information from retrieved documents beyond standard prompting techniques.

Long context model and retrieval for very long documents

Our long-context model (sagittarius-v1) currently supports 24K tokens by default.

However, we found that there is huge headroom in natively optimizing retrieval to work with long-context models. We have verified that our setup works with documents as long as 100K tokens.

To test this, we created a high quality benchmark specifically for monitoring and tracking performance on realistic, long context tasks. One of our internal datasets is constructed by collecting publicly available movie plots. We evaluated the speed and accuracy of Yasa-1 on evaluation questions. We found that our setup is able to achieve comparable quality while being approximately 8x faster compared to using a 100K context state-of-the-art model directly.

Model Accuracy Median Time Per Query
External model – A (no context) 26% 1.96s
External model – B (100K context) 83% 59.4s
Yasa-1 (text-sagittarius-v1) 82% 7.9s
Table 1: Comparing Yasa-1 long context against external models on an internal benchmark of reading movie plots.

Code Interpreter

Yasa is not just a passive AI assistant. It can take actions by executing code. We support this feature via a simple flag. When the flag is activated, Yasa automatically identifies the code block in its response, executes the code, and appends the execution output at the end of the code block. 

We show how to use this feature to perform arithmetic operations, analyze spreadsheets, or create visualizations below.

Arithmetic

File reading and graph plotting

Other examlples

(placeholder for date) (placeholder for bird) (placeholder for taylor swift)

Customization

For any of the use cases above, our model can be further customized to get the best performance. If you are interested in customized Yasa, please reach out to us at contact@reka.ai.

Evaluation

We recognize the importance of deploying responsibly for a frontier technology such as a multimodal AI assistant. Instead of relying on static datasets, we have built a dynamic multidimensional evaluation framework to scientifically benchmark our AI assistant across three dimensions: correctness, safety, and helpfulness. We use either human evaluation or automatic evaluation to obtain these scores.

Correctness evaluates how accurate an answer is and penalizes outputs that contain false or misleading information. Safety measures how suitable an answer is for an AI assistant and penalizes outputs that harm the user or other individuals, or involve illegal or controversial contents. Helpfulness checks whether an answer helps the user achieve their goal and penalizes outputs that do not follow user-provided instructions.

The three axes above can be combined to produce an overall quality score. For example, according to this metric Yasa is XX% comparable or better than a publicly available multimodal AI assistant on image related prompts. For text-based use cases, Yasa is 65% comparable or better than a state-of-the-art language-only AI assistant.

Limitations

Yasa may generate inaccurate outputs. Do not rely on it for critical advice.

For multimodal understanding, it is best used to get high level descriptions of image, video, or audio contents. Without customization, it has limited capabilities identifying details in multimodal media.

For search and retrieval, while we provide citations, there is no guarantee that Yasa retrieves the most relevant documents for a given question. We support customization to improve retrieval performance.

For code execution, it is only available for an on-premise deployment at the moment.

Expect Yasa’s capabilities to improve significantly in the next few months.

Closing Remarks

We are proud to have one of the best models in its compute class, but we are only getting started. Yasa is a generative agent with multimodal capabilities.

It is a first step towards our long-term mission to build a future where superintelligent AI is a force for good, working alongside humans to solve our major challenges.

We are hiring for strong technical talents anywhere in the world. If you are excited about our mission and our work, please apply here.

Reka Team

Aleksiev, Kaloyan
Artetxe, Mikel
Bain, Max
Fu, Deyu
Henderson, Matt
Lamprecht, Eugenie
Lei, Li
Liu, Qi
Masson d’Autume, Cyprien
Ong, Donovan
Ormazabal, Aitor
Pham, Hai
Phua, Samuel
Tay, Yi
Wang, Yuqi
Yang, Yazheng
Yogatama, Dani
Zheng, Che
Zhu, Zhongkai