Google has introduced Whisk, a new image generation tool within Google Labs that uses existing images as prompts. However, Whisk doesn’t recreate or edit the source image; instead, it captures its “essence” to generate new, stylized visuals. This makes it more suitable for brainstorming and quick visualizations than precise image manipulation.
Read: Bandwidth Blog & Smile 90.4FM Tech Tuesday: Moon base missions!
Described as “a new type of creative tool,” Whisk offers a basic interface with inputs for style and subject. Initially, users can choose from three predefined styles: sticker, enamel pin, and plushie. These simpler styles appear to be best suited to the tool’s current capabilities, producing rougher, outline-based outputs. For example, Whisk successfully generated a recognizable image of a Wilford Brimley plushie, even bypassing Google’s restrictions on celebrity images.
A more advanced editor, accessible by clicking “Start from scratch,” allows users to input text or a source image for the subject, scene, and style, with an additional text input for further refinements. However, in its current state, the advanced controls haven’t consistently produced results that accurately reflect user queries. For instance, an attempt to create a plushie image using the advanced editor resulted in an image of someone vaguely resembling Wilford Brimley eating oatmeal within a lightbox, not a plushie.
Google emphasizes that Whisk draws only from “a few key characteristics” of the source image, cautioning that the generated subject may differ in height, weight, hairstyle, or skin tone. This is because Whisk employs a two-step process: first, the Gemini language model generates a detailed caption of the uploaded image; then, this caption is fed into the Imagen 3 image generator. Consequently, the final output is based on Gemini’s description of the image, not the image itself.