00 / PROJECT DETAILS

Duration

3 weeks

Team

Solo project

Role

Design and testing

Problem

With AI Image generation becoming popular, more and more people are interacting with models like Dall-E, Midjourney and Stable Diffusion. The current mode of interaction is limited to writing prompts. Writing good prompts to precisely get the images one needs is challenging.

Solution

I designed an interface that takes off the cognitive load of writing all the details of the prompt using AI and a simple GUI.

03 / UNDERSTANDING AI IMAGE GENERATION

AI image generation is all about prompts

Dall-E, Stable Diffusion, Midjourney etc. all use a simple interface where users can input a prompt to generate images.
Beginners tend to write simple prompts which often produce images which don't match expectations.
Writing good prompts that precisely generate the kind of images you need takes experience and skill.
Prompt engineering is a complex domain. Good prompts are so valuable that they are sold on marketplaces.

04 / research

I spent time on the Midjourney Discord Server observing how users write prompts

Findings

Good prompts are complex, detailed and specific.

Expert users are able to precisely generate images they want by constructing elaborate and specific prompts. They add parameters at the end of their prompts to produce the aesthetic they want in their image.

The anatomy of a prompt

The prompt contains the subject and some aesthetic parameters that influence the final image.

For example:
A watercolor painting of a tree made of chocolate in the style of Hokusai, bright colors, 4:3

05 / COMPETITOR ANALYSIS

There were tools that helped you create prompts for AI models

Phraser.tech

✅ Selecting aesthetic parameters

✅ Intuitive UI

✅ Supports multiple AI models

❌ No help in editing prompt subject

Creative Fabrica

✅ Selecting aesthetic parameters

✅ Intuitive UI

❌ No support for multiple AI models

❌ No help in editing prompt subject

promptoMANIA

✅ Selecting aesthetic parameters

❌ Unintuitive UI

✅ Supports multiple AI models

❌ No help in editing prompt subject

I could see the common gap between all the tools was that they didn't help users improve the actual subject of the prompt and just focussed on building a GUI around selecting aesthetic parameters.

07 / INITIAL ideas

Using Chat GPT/GPT-3 to improve prompts

I came up with 2 different ideas for how using an AI model like Chat-GPT or GPT-3 could be used to improve subjects in prompts.

Idea #1

AI makes edits to the prompt

Idea #2

AI offers suggestion

I decided to go with Idea #2

I decided to go with the suggestions approach as it was more suitable as a way to inspire users to think about alternatives. Editing the prompt (Idea #1), on the other hand, seemed more prescriptive.

08 / FINAL DESIGN

Remixing a prompt using AI↓

The Remix feature takes the prompt typed and offers AI generated variations which serve as inspiration.

Remixing can be done as many times as needed by the user.

Changing weights of terms↓

I also enabled users to have more agency by allowing them to change the relative weights of terms to have less or more influence on the final image.

Selecting Aesthetic Parameters↓

Thumbnails show the visual effect of different aesthetic parameters and users can select which ones they want reproduced in their images.

09 / Testing THE DESIGN

I tested the concept with 5 design students

I took notes from the interviews and then extracted snippets of the data which I then clustered inductively based on shared themes. I then translated the resultant themes into insights. There were 2 major themes that stood out.

Theme 1: Overall Positive Reaction to the Remix Feature

Participants had overall really positive reactions to the Remix interaction. They liked the simplicity of the interaction and the fact that it was AI powered and reduced their cognitive load.

Theme 2: Participants found it hard to understand how exactly an aesthetic parameter would affect the final image.

Participants had a lot of feedback about the visual parameters and the interface. Just selecting a parameter did not make it clear based on the visual and the label as to how the final image should look. Low awareness about certain artists, lack of visual context and reference and the design of the thumbnail images only showing a generic landscape made it hard to set expectations for the final images.

10 / FUTURE WORK

Based on the user testing, the Remix feature worked well and seems like a step in the right direction. It can further be improved by adding a history of prompts used/generated so that users can compare multiple versions of a prompt and go back to a more effective one.
The interface could be made more conversational than just generating one alternate suggestion. For example, how Chat-GPT can edit previous responses
This is an emerging area of work and AI image generation is still limited to natural language inputs. I'm planning to explore visual input interfaces for this technology like input images or sketching as a mode of editing images.

/magine

00 / PROJECT DETAILS

With AI Image generation becoming popular, more and more people are interacting with models like Dall-E, Midjourney and Stable Diffusion. The current mode of interaction is limited to writing prompts. Writing good prompts to precisely get the images one needs is challenging.

I designed an interface that takes off the cognitive load of writing all the details of the prompt using AI and a simple GUI.

01 / SUmmary in a minute

Problem Space

The Solution

/magine (read 'imagine') is a web based tool that uses AI and a simple GUI to help users write effective prompts.

02 / PROCESS overview

03 / UNDERSTANDING AI IMAGE GENERATION

AI image generation is all about prompts

04 / research

I spent time on the Midjourney Discord Server observing how users write prompts

Findings

Good prompts are complex, detailed and specific.

Expert users are able to precisely generate images they want by constructing elaborate and specific prompts. They add parameters at the end of their prompts to produce the aesthetic they want in their image.

The anatomy of a prompt

The prompt contains the subject and some aesthetic parameters that influence the final image.

For example:
A watercolor painting of a tree made of chocolate in the style of Hokusai, bright colors, 4:3

05 / COMPETITOR ANALYSIS

There were tools that helped you create prompts for AI models

06 / OPPORTUNITY AREA

How can we help users improve their prompt subjects?

07 / INITIAL ideas