Skip to content

Instantly share code, notes, and snippets.

@bedwards
Last active January 19, 2026 14:12
Show Gist options
  • Select an option

  • Save bedwards/babe4fc362900d93bc3ccc8d2ca47d19 to your computer and use it in GitHub Desktop.

Select an option

Save bedwards/babe4fc362900d93bc3ccc8d2ca47d19 to your computer and use it in GitHub Desktop.
a

Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) are Google models for quick and efficient image analysis and creation. These are available through the Gemini API with SDKs (Python, Node.js, etc.), the Files API for large uploads, Firebase, or a CLI. They allow for tasks like visual reasoning, editing with prompts (object removal, pose changes), and complex image analysis within apps. An API key and prompt engineering are needed for optimal results. Key Concepts Nano Banana (Gemini 2.5 Flash Image): It focuses on speed and efficiency for tasks involving a high volume of images. It is effective for consistent character generation and prompt-based editing. Nano Banana Pro (Gemini 3 Pro Image): This offers advanced reasoning, search grounding, and high-fidelity output for complex creative projects. How to Use Get an API Key: Set up authentication with Google AI services. Use the Gemini API: Send requests with multimodal input (text and images). SDKs: Use client libraries (Python, JS, etc.) for easier integration. Files API: Upload larger images or those used repeatedly for better efficiency. Inline Images: Pass Base64 encoded images directly in the request for smaller inputs. Capabilities Image Understanding: Analyze diagrams, sketches, and visual content. Prompt-Based Editing: Remove objects, change poses, or perform local edits with text prompts. Visual Reasoning: Solve hand-drawn equations, understand complex scenes. Document Processing: Analyze entire PDFs, extracting info from text, images, and tables. Consistent Generation: Create consistent characters or products across images. Tools & Interfaces Gemini API/SDKs: This is the main way to integrate into apps. Gemini CLI: This is for terminal-based image analysis (e.g., gemini analyze image.png). Firebase AI Logic: This allows direct integration into apps. Google AI Studio: This is for experimenting with models visually. Example Workflow (Conceptual) python

Using Python SDK (simplified)

from google import GenerativeModel

model = GenerativeModel("gemini-2.5-flash-image") # Nano Banana image_data = open("my_banana.jpg", "rb").read() # Read local image

response = model.generate_content(["Describe this image.", image_data]) print(response.text) Key Takeaway: Nano Banana allows developers to build intelligent applications that understand and manipulate visual information, integrating text and images.

It is possible to use Google's Gemini Nano (also known as Gemini 2.5 Flash Image or "Nano Banana Pro") to analyze images within an Anthropic Claude agent. This can be done programmatically using methods such as the Claude Agent SDK, CLI, or API. Integration Methods The Claude Agent SDK uses "tools" and "agentic capabilities" such as bash command execution and file system access. These allow interaction with external systems and AI models like Gemini. Model Context Protocol (MCP) Servers: An MCP server can act as a bridge between the Claude agent and the Gemini API. The Claude agent, through the SDK, can call this server. The server can then interact with the Gemini model to process images and return results to the Claude agent. More information can be found in the Composio documentation. Command Line Interface (CLI) Tools: The Claude agent can use bash commands to call a custom CLI tool. This tool can be developed to send images to the Gemini API for analysis. This allows the Claude agent to manage the image analysis process. Direct API Calls (within a tool): A custom tool can be defined within the Claude Agent SDK code. This tool makes direct API calls to the Google AI Studio or Vertex AI endpoints. This leverages Gemini Nano's image analysis and generation capabilities. Third-party Libraries: Libraries like litellm can be used. These can act as a wrapper, allowing use of different models, including Gemini, with an OpenAI-compatible interface. Key Considerations API Keys: Valid API keys are needed for both Anthropic's Claude and Google's Gemini/Vertex AI services. Tool Definition: The necessary "tools" must be defined in the Claude agent's configuration. This specifies how to interact with the external Gemini functionality. Data Transfer: Images can be handled by passing file paths or content directly to the external tool/API call. This uses the Claude SDK's file system access capabilities. By using these methods, a Claude agent can effectively use Gemini Nano for tasks such as image reading and analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment