| name | description |
|---|---|
image-prompt |
Generate optimized prompts for Google's Nano Banana image generators (Nano Banana 2 and Nano Banana Pro). Use when the user invokes /image-prompt or asks to create, write, craft, or generate a prompt for Nano Banana, Gemini image generation, or AI image creation. Also triggers when the user says "image prompt", "nano banana prompt", "nano banana 2", or "help me describe an image".
|
Generate high-quality, optimized prompts for Google's Nano Banana image generators, based on Google's official prompting guidance.
Both models use the same prompting approach. Help the user pick the right one if they're unsure:
| Nano Banana 2 | Nano Banana Pro | |
|---|---|---|
| Speed | Fast (Flash-based) | Slower, highest fidelity |
| Best for | Rapid iteration, everyday use | Maximum detail and factual accuracy |
| Default in | Gemini app, Google Search | Premium plans (AI Pro/Ultra) |
| Resolution | 512px to 4K (specify explicitly) | Up to high-res |
| Characters | Up to 5 consistent characters | Via reference images |
| Objects | Up to 14 consistent objects | Up to 14 reference images |
| Instruction following | Enhanced — handles complex prompts well | Strong |
Nano Banana 2 is the default for most users. Recommend Pro only when the user needs maximum fidelity or highly accurate real-world detail.
Note: Model availability and plan details may change. Check Google's current documentation if the above table seems outdated.
Both models understand intent, physics, and composition. They reward clear creative direction over keyword lists.
-
Natural language over tag soup — Write as if briefing a human artist, not listing keywords.
- BAD: "dog, park, sunset, 4k, realistic, cinematic"
- GOOD: "A golden retriever bounding through a sun-dappled park at golden hour, shot from a low angle with shallow depth of field"
-
Specificity matters — Define subjects precisely with materiality, texture, and detail.
- Instead of "a woman": "a sophisticated elderly woman wearing a vintage Chanel-style tweed suit"
- Include materials: "matte finish," "brushed steel," "soft velvet," "weathered leather"
-
Provide context about purpose — Mention the use case or audience.
- "Create a hero image for a premium coffee brand's website" helps the model infer professional lighting, composition, and mood.
-
Edit, don't re-roll — When a generated image is mostly correct, request specific conversational changes rather than starting over.
Ask the user to describe what they want. If their description is vague, ask targeted questions to fill in these dimensions:
| Dimension | What to ask | Examples |
|---|---|---|
| Subject | Who or what is the main focus? | Person, object, scene, abstract concept |
| Setting/Environment | Where does this take place? | Urban, nature, studio, fantastical |
| Mood/Atmosphere | What feeling should it evoke? | Serene, dramatic, playful, mysterious |
| Style | What visual style? | Photorealistic, watercolor, cinematic, editorial, retro, anime |
| Composition | How should it be framed? | Close-up, wide shot, bird's eye, rule of thirds |
| Lighting | What lighting conditions? | Golden hour, dramatic side-light, soft diffused, neon |
| Purpose/Use case | Where will this image be used? | Social media, website hero, print poster, presentation |
| Text (if any) | Any text to render in the image? | Put exact text in quotation marks |
| Resolution | What resolution/aspect ratio? | 512px, 1080p, 4K; 16:9, 9:16, 1:1 |
Do not ask all questions at once. Start with the most important gaps based on what the user already provided. Ask 2-3 questions maximum per round.
Construct the prompt using this structure (not all elements are required — use what's relevant):
[Style/medium] of [specific subject with details] in [setting/environment],
[action or pose], [lighting description], [mood/atmosphere],
[camera angle/composition], [additional details: texture, color palette, materiality].
[Purpose context if relevant.]
Key techniques to apply:
- Camera language: Use cinematic terms — "wide establishing shot," "tight close-up," "over-the-shoulder," "Dutch angle," "shallow depth of field"
- Lighting specifics: "Rembrandt lighting," "backlit with rim light," "soft window light from the left," "dramatic chiaroscuro"
- Material and texture: "brushed aluminum," "hand-knit wool," "cracked leather," "translucent glass"
- Color direction: "muted earth tones," "high-contrast complementary colors," "monochromatic blue palette"
- Text rendering: Place exact text in quotation marks — Nano Banana has state-of-the-art text legibility
- Resolution: For Nano Banana 2, explicitly request resolution (512px up to 4K) when it matters
- Aspect ratio: Suggest appropriate ratio for the use case (16:9 landscape, 9:16 portrait, 1:1 square)
Output the prompt in a clearly marked block, followed by brief rationale notes explaining key choices:
PROMPT:
─────────────────────────────────────────────
[The generated prompt text]
─────────────────────────────────────────────
WHY THESE CHOICES:
- [Element]: [Brief explanation of why this detail was included]
- [Element]: [How it helps the model produce better results]
After presenting the prompt, ask the user:
"Would you like to adjust anything? I can refine the style, composition, lighting, mood, or any other element. You can also ask me to create a variant with a different approach."
Continue iterating until the user is satisfied.
Mention these to the user when relevant:
Both models support multi-image context for consistency:
- Nano Banana 2: Up to 5 consistent characters and 14 consistent objects per workflow
- Nano Banana Pro: Up to 14 reference images (6 with high fidelity)
Instruct the model with:
- "Use the uploaded images as a strict style reference"
- "Keep the character from Image 1 but place them in the setting from Image 2"
- For character consistency: "Keep facial features exactly the same as the reference"
- For storyboarding: "The identity and attire of all characters must stay consistent throughout"
After generating an image, the user can type natural-language edits:
- "Change the sunny day to a rainy night"
- "Remove the person in the background and add a potted plant"
- "Make the text neon blue instead of white" The model adjusts lighting, reflections, and physics automatically.
Both models have state-of-the-art text rendering and support translation/localization of text within images. Always put exact text in quotation marks in the prompt. Specify text style if needed: "bold sans-serif," "handwritten script," "retro neon sign." For localization, try: "Translate the text in this image to Japanese."
The model can convert between 2D and 3D:
- Hand-drawn sketches to photorealistic renders
- Floor plans to 3D room visualizations
- Wireframes to high-fidelity UI mockups
Users can upload layout sketches to control composition:
- Hand-drawn sketches defining where text and objects should go
- Grid layouts for tile-based designs
- Wireframes for UI mockup generation
When reviewing or building prompts, watch for and fix these:
- Tag soup: Disconnected keyword lists — rewrite as natural sentences
- Vague subjects: "a person" or "a building" — add specific details
- Missing mood/lighting: These dramatically affect output quality — always include them
- No context: Omitting the purpose/use case — include it when relevant
- Over-prompting: Contradictory or excessive details that confuse the model — keep it coherent and focused
Use these as calibration for quality:
Product photography:
A flat lay of artisanal coffee beans spilling from a matte black ceramic cup onto a weathered oak table, soft directional window light from the upper left, warm earth tones with deep shadows, shot from directly above, styled for a premium coffee brand's Instagram feed.
Portrait:
A cinematic medium close-up portrait of a jazz musician mid-performance, eyes closed, sweat glistening under warm amber stage lighting, shallow depth of field with bokeh from string lights in the background, shot on what looks like 35mm film with natural grain.
Text-heavy design:
A vintage-style concert poster with the text "MIDNIGHT REVERIE" in bold art deco typography at the top, a silhouette of a saxophone player against a deep indigo night sky with a full moon, "Live at The Blue Note — March 15, 2026" in smaller elegant serif type at the bottom, gold and navy color palette.
Fantasy/Illustration:
A lush watercolor illustration of a hidden forest library, towering bookshelves made from living trees with glowing mushrooms as reading lamps, a cozy armchair draped in moss-green velvet, shafts of golden sunlight filtering through the canopy above, whimsical and enchanting atmosphere.
Prompting advice in this skill is based on:
- 7 tips to get the most out of Nano Banana Pro — Google Blog
- Nano Banana 2: Combining Pro capabilities with lightning-fast speed — Google Blog
- Nano Banana image generation — Google AI for Developers
Note that the installation instructions are in the header, but if that's not visible:
Installation
Create the skill directory:
mkdir -p ~/.claude/skills/image-prompt
Copy this file to:
~/.claude/skills/image-prompt/SKILL.md
Restart Claude Code (or start a new session). The skill will be
available as /image-prompt, and will also auto-trigger when you
mention "nano banana prompt", "image prompt", "help me describe
an image", or similar phrases.