/magine

Making AI image generation easier
Course project @UC BerkeleyEmerging TechUI Design

00 / PROJECT DETAILS

Duration
3 weeks
Team
Solo project
Role
Design and testing
Problem

With AI Image generation becoming popular, more and more people are interacting with models like Dall-E, Midjourney and Stable Diffusion. The current mode of interaction is limited to writing prompts. Writing good prompts to precisely get the images one needs is challenging.

Solution

I designed an interface that takes off the cognitive load of writing all the details of the prompt using AI and a simple GUI.

01 / SUmmary in a minute

Problem Space

  • Image generation with tools like Dall-E, Midjourney and Stable Diffusion rely on prompts.
  • Writing good prompts is hard.

The Solution

/magine (read 'imagine') is a web based tool that uses AI and a simple GUI to help users write effective prompts.

    02 / PROCESS overview

    03 / UNDERSTANDING AI IMAGE GENERATION

    AI image generation is all about prompts

    • Dall-E, Stable Diffusion, Midjourney etc. all use a simple interface where users can input a prompt to generate images.

    • Beginners tend to write simple prompts which often produce images which don't match expectations.
    • Writing good prompts that precisely generate the kind of images you need takes experience and skill.

    • Prompt engineering is a complex domain. Good prompts are so valuable that they are sold on marketplaces.

    04 / research

    I spent time on the Midjourney Discord Server observing how users write prompts

    Findings

    Good prompts are complex, detailed and specific.

    Expert users are able to precisely generate images they want by constructing elaborate and specific prompts. They add parameters at the end of their prompts to produce the aesthetic they want in their image.

    The anatomy of a prompt

    The prompt contains the subject and some aesthetic parameters that influence the final image.

    For example:
    A watercolor painting of a tree made of chocolate in the style of Hokusai, bright colors, 4:3

    05 / COMPETITOR ANALYSIS

    There were tools that helped you create prompts for AI models

    Phraser.tech

    ✅ Selecting aesthetic parameters

    ✅ Intuitive UI

    ✅ Supports multiple AI models

    No help in editing prompt subject

    Creative Fabrica

    ✅ Selecting aesthetic parameters

    ✅ Intuitive UI

    ❌ No support for multiple AI models

    ❌ No help in editing prompt subject

    promptoMANIA

    ✅ Selecting aesthetic parameters

    ❌ Unintuitive UI

    ✅ Supports multiple AI models

    ❌ No help in editing prompt subject

    I could see the common gap between all the tools was that they didn't help users improve the actual subject of the prompt and just focussed on building a GUI around selecting aesthetic parameters.

    06 / OPPORTUNITY AREA

    How can we help users improve their prompt subjects?

    From my competitor research I could see that existing tools helped users select aesthetic parameters but none of them helped users rephrase their prompt subject.

    07 / INITIAL ideas

    Using Chat GPT/GPT-3 to improve prompts

    I came up with 2 different ideas for how using an AI model like Chat-GPT or GPT-3 could be used to improve subjects in prompts.

    Idea #1

    AI makes edits to the prompt

    Idea #2

    AI offers suggestion

    I decided to go with Idea #2

    I decided to go with the suggestions approach as it was more suitable as a way to inspire users to think about alternatives. Editing the prompt (Idea #1), on the other hand, seemed more prescriptive.

    08 / FINAL DESIGN

    Remixing a prompt using AI↓

    The Remix feature takes the prompt typed and offers AI generated variations which serve as inspiration.

    Remixing can be done as many times as needed by the user.

    Changing weights of terms↓

    I also enabled users to have more agency by allowing them to change the relative weights of terms to have less or more influence on the final image.

    Selecting Aesthetic Parameters↓

    Thumbnails show the visual effect of different aesthetic parameters and users can select which ones they want reproduced in their images.

    09 / Testing THE DESIGN

    I tested the concept with 5 design students

    I took notes from the interviews and then extracted snippets of the data which I then clustered inductively based on shared themes. I then translated the resultant themes into insights. There were 2 major themes that stood out.

    Theme 1: Overall Positive Reaction to the Remix Feature

    Participants had overall really positive reactions to the Remix interaction. They liked the simplicity of the interaction and the fact that it was AI powered and reduced their cognitive load.

    Theme 2: Participants found it hard to understand how exactly an aesthetic parameter would affect the final image.

    Participants had a lot of feedback about the visual parameters and the interface. Just selecting a parameter did not make it clear based on the visual and the label as to how the final image should look. Low awareness about certain artists, lack of visual context and reference and the design of the thumbnail images only showing a generic landscape made it hard to set expectations for the final images.

    10 / FUTURE WORK