Dall-E, Stable Diffusion, Midjourney etc. all use a simple interface where users can input a prompt to generate images.
Writing good prompts that precisely generate the kind of images you need takes experience and skill.
Phraser.tech
✅ Selecting aesthetic parameters
✅ Intuitive UI
✅ Supports multiple AI models
❌ No help in editing prompt subject
Creative Fabrica
✅ Selecting aesthetic parameters
✅ Intuitive UI
❌ No support for multiple AI models
❌ No help in editing prompt subject
promptoMANIA
✅ Selecting aesthetic parameters
❌ Unintuitive UI
✅ Supports multiple AI models
❌ No help in editing prompt subject
I could see the common gap between all the tools was that they didn't help users improve the actual subject of the prompt and just focussed on building a GUI around selecting aesthetic parameters.
From my competitor research I could see that existing tools helped users select aesthetic parameters but none of them helped users rephrase their prompt subject.
I came up with 2 different ideas for how using an AI model like Chat-GPT or GPT-3 could be used to improve subjects in prompts.
AI makes edits to the prompt
AI offers suggestion
I decided to go with the suggestions approach as it was more suitable as a way to inspire users to think about alternatives. Editing the prompt (Idea #1), on the other hand, seemed more prescriptive.
The Remix feature takes the prompt typed and offers AI generated variations which serve as inspiration.
Remixing can be done as many times as needed by the user.
I also enabled users to have more agency by allowing them to change the relative weights of terms to have less or more influence on the final image.
Thumbnails show the visual effect of different aesthetic parameters and users can select which ones they want reproduced in their images.
I took notes from the interviews and then extracted snippets of the data which I then clustered inductively based on shared themes. I then translated the resultant themes into insights. There were 2 major themes that stood out.
Participants had overall really positive reactions to the Remix interaction. They liked the simplicity of the interaction and the fact that it was AI powered and reduced their cognitive load.
Participants had a lot of feedback about the visual parameters and the interface. Just selecting a parameter did not make it clear based on the visual and the label as to how the final image should look. Low awareness about certain artists, lack of visual context and reference and the design of the thumbnail images only showing a generic landscape made it hard to set expectations for the final images.