New Apple AI model uses text commands to edit images

While Apple may not be considered a top contender in the AI arena, the company’s latest open-source AI model for image editing demonstrates its potential contributions to the field. Named MLLM-Guided Image Editing (MGIE), this model utilizes multimodal large language models (MLLMs) to interpret text-based commands for manipulating images. In essence, MGIE enables users to edit photos based on the text input provided. Although similar tools exist, the project’s paper highlights that “human instructions are sometimes too brief for current methods to capture and follow.”

Read: EcoFlow River 2 Review: Power in portability

Developed in collaboration with researchers from the University of California, Santa Barbara, MGIE harnesses MLLMs to translate simple or ambiguous text prompts into detailed and actionable instructions for the photo editor. For example, a command to “make a pepperoni pizza more healthy” might be interpreted as “add vegetable toppings” and executed accordingly.

In addition to facilitating significant image alterations, MGIE can perform tasks such as cropping, resizing, and rotating photos, as well as adjusting brightness, contrast, and colour balance, all through text commands. Furthermore, it enables targeted edits to specific areas of a photo, such as modifying a person’s hair, eyes, or clothing, or removing elements from the background.

Apple has made the model available on GitHub, and interested parties can explore a demo hosted on Hugging Face Spaces. However, Apple has not disclosed any plans to incorporate the insights gained from this project into its products as a tool or feature.