I mean, what is prompt engineering anyway? This new role in the software engineering space is all about being able to make the tweaks necessary to get exactly what you desire from a given generative AI tool such as OpenAI's ChatGPT or Google's Bard on the writing side, or OpenAI's DALL-E or MidJourney for static visuals as examples.
Now, MidJourney has been ahead of DALL-E for a while, but with the recent release of DALL-E model 3, the two are in rough parity. I expect MidJourney to jump ahead again, but we'll see what happens.
GPT4, the newer language model powering paid versions of ChatGPT appears more capable today than the current iteration of Google's Bard. These are the primary tools I use from a generative AI perspective. I may go into experiments with other generative outliers in the future, covering things like animation, or music with vocals.
For today, let's focus on prompt engineering, and how AI might describe this practice itself. For this experiment, I am choosing MidJourney and ChatGPT powered by GPT4.
Why did I do this?
I noticed, while setting up a new tag for prompt engineering on this blog that it was the only tag for which I had not assigned an image to represent it. This image appears at the top of the tag page which shows a list of posts with that tag. But it also appears on the homepage to represent the category. At least with the theme I am using at the time of publication.
I felt I needed a decent image to represent the category. I could have gone looking on Unsplash or similar sites. My CMS does make it easy to do that.
I could have also done a Google image search and just pilfered someone's work uncredited. But that wouldn't be nice.
And what better way to get an image for prompt engineering, than with prompt engineering? Am I right?
The first thing I really did, was go straight to MidJourney and ask it for a hero image for a blog. I was errantly expecting MidJourney to understand the context of generative AI, which, duh, it's not designed to do.
The results for this prompt were cool looking, but not probably representative of what I ultimately wanted. I wasn't looking for "heroes", as much as an image to be placed as the "hero" on a page indexing posts about generative AI prompts. This is typical web design speak and just means the image that grabs your attention at the top of the page.
Since this wasn't what I wanted, I changed my prompt and tried again.
Here, again, MidJourney generated some interesting images. Any one of them would be cool to look at, maybe not as exciting as our heroes, but definitely more artistic.
I think MidJourney read the word "prompt" and interpreted promptness, as related to time. Contextually what it was intended to mean was prompt, as in to ask or seek a response.
Well, after two tries, it's time to use our skills a bit better.
To start, we need language to describe prompt engineering. Something that has a visual flourish we can use. This means, our first stop should be ChatGPT, where we can ask it to describe the practice and we can use or refine that description for MidJourney.
Note that in my prompt engineering posts, I will always represent the prompts and responses with callouts like the ones below. The "brain" represents the prompt, and, when the response is concise enough, the chat bubble represents the response. If the response is too long, I will put it into a collapsible content area.
Certainly! Let's describe AI prompt engineering using the metaphor of "painting a picture."
1. Blank Canvas (Starting Point):
Imagine that the process of interacting with an AI is like creating a painting. The blank canvas represents our initial state; nothing has been given to the AI.
2. Brush Strokes (Prompts):
Your prompts are the brush strokes on the canvas. Each stroke, or each word and phrase you use, guides the AI's output in a specific direction, just as every brush stroke affects the outcome of the painting.
3. Choice of Colors (Specificity):
Being specific in your prompts is like choosing the right colors for your painting. If you're vague, the outcome might be more abstract or unexpected. But if you choose your "colors" (words) with care and precision, the resulting "painting" (AI output) is more likely to match your vision.
4. Layering (Iterative Refinement):
Sometimes, one brush stroke isn't enough. You might need to add layers to the canvas, refining your image. Similarly, in prompt engineering, you might iterate and refine your prompts, building on the AI's responses to guide it to the desired outcome.
5. Guided Vision (Objective):
Before you start painting, you have a vision or a picture in your mind. With prompt engineering, you have an objective or an intended response from the AI. Your prompts guide the AI towards that vision.
6. Interacting Elements (Context):
In a painting, the background, foreground, and various elements interact and influence each other. In AI prompting, previous interactions and context can influence the AI's response, much like how different parts of a painting come together.
7. The Artist's Touch (Expertise):
Just as an experienced painter knows how to blend colors and guide the brush to create their desired image, an expert in AI prompt engineering knows how to craft and adjust prompts to obtain optimal results from the AI.
8. The Final Picture (Output):
Once you're done adding all your brush strokes, layering, and adjusting, you step back to admire the final painting. Similarly, after carefully crafting your prompts and guiding the AI, you get the output or response which ideally should match or come close to your desired result.
So, AI prompt engineering is like guiding a painting process, where every prompt, refinement, and iterative interaction helps create a "masterpiece" of information or solution that the user desires.
Now sure, we could take this response verbatim and feed it to MidJourney, but in my experience, it tends to work best with a paragraph or less when generating usable imagery. At least up to v5.2 of the MidJourney model in use at the time of this post's publication.
The long description provided by ChatGPT is a good summation, but it's simply too wordy for MidJourney to make sense of given its focus.
Since ChatGPT pays attention to the context and history of our conversation, we can just reference the earlier ask, and the system's response when asking it any immediate follow-up questions.
So, let's ask ChatGPT to slim that down a bit.
Alright! Let's paint a vivid picture that encapsulates the process of AI prompt engineering:
The AI Prompt Engineering Gallery
We're inside an opulent art gallery. The walls are a soft, off-white, allowing the colorful paintings to pop. The gallery is illuminated by elegant chandeliers, their light gleaming off the polished marble floor.
At the center of the gallery is a massive easel holding a work-in-progress canvas.
1. Blank Canvas:
The canvas starts off blank, symbolizing the untouched state of the AI. It’s pristine, full of potential.
2. Palette and Brushes:
To the left of the canvas stands a table holding a palette loaded with vibrant paints and an array of brushes. This represents the myriad ways one can craft prompts. The brushes of different sizes signify the various levels of specificity and detail one might employ.
3. Hand (User):
A painter's hand (representing the user) is poised with a brush, carefully selecting colors from the palette, indicating the choice of words and the precision required in crafting prompts.
4. Strokes on Canvas:
The hand has started to paint on the canvas. Some strokes are definite and bold, some are soft and delicate, while others are being layered over previous ones. These represent the variety of prompts, iterations, and refinements.
5. Spectacles (Clarity):
On the table, next to the palette, rests a pair of spectacles. This signifies the need for clarity and vision. Periodically, the hand places the spectacles on, taking a moment to observe and adjust, symbolizing the iterative adjustments in prompting for clearer outcomes.
6. Shadowy Figures (AI Responses):
As the hand continues to paint, shadowy figures start to form on the canvas, representing the AI's responses. Initially, some are abstract, while others start taking a clear shape, based on the precision of the brush strokes.
7. Rear View Mirror (Past Context):
To the right of the canvas, a vintage rear-view mirror stands on a pedestal. The mirror reflects parts of older paintings (past interactions) from the gallery walls, symbolizing how past prompts and responses can influence current outcomes.
8. Golden Frame (Final Output):
Around the canvas, there's an ornate golden frame, but it’s not fully formed. As the painting nears completion, the frame becomes more defined, representing the AI's refined output based on effective prompt engineering.
The entire scene depicts the dynamic, interactive, and iterative nature of AI prompting. The gallery ambiance underscores the art and skill involved in the process, while each element in the scene embodies a facet of the prompting journey.
My intent here was to get ChatGPT to shorten its explanation. Which it did. But the result remained more wordy than usable, although artistically descriptive in a way that MidJourney could use. That is if I made some edits, like removing the numbers for the ordered list.
I think we can still do better. Remembering that ChatGPT remembers our earlier asks, we can get nice and simple here. Let's just ask for a short summary.
Now, I still think this is a bit long for what I would give to MidJourney myself. But, it's within the bounds of reasonability. And, it did a pretty nice job generating topical visuals on the first try.
This was a fun little experiment and an easy way to generate something useful and representative for a blog.
How do you generate your content? What generative AI tools might be useful for what you do?
Be sure to subscribe, so you can get notified of new posts, and follow my adventures!
Sign up for ChrisGiddings.net
Personal musings and things.
No spam. Unsubscribe anytime.