xAI adds image processing to the new version of its Grok AI

xAI, the OpenAI competitor founded by Elon Musk, has unveiled Grok-1.5V, the first version of Grok capable of processing visual information. This multimodal AI model can not only handle text but also interpret documents, diagrams, charts, screenshots, and photographs. In its announcement, xAI provided examples of real-world applications for Grok-1.5V, such as translating a flow chart into Python code, writing a story based on a drawing, and explaining memes.

This release follows the introduction of Grok-1.5 just weeks ago. That model was designed to excel in coding and math compared to its predecessor and was capable of processing longer contexts, allowing it to analyse more data sources for improved understanding.

While xAI has not specified an exact timeline for the rollout of Grok-1.5V, the company said early testers and existing users will soon benefit from its features.

Additionally, xAI has launched a benchmark dataset called RealWorldQA, containing 700 images for evaluating AI models. Each item in the dataset includes questions and answers that can be easily verified but may challenge multimodal models like Grok.

When tested against competitors such as OpenAI’s GPT-4V and Google Gemini Pro 1.5 using RealWorldQA, xAI claimed that Grok achieved the highest score.