A guide for using Content Understanding with Azure AI Foundry projects.
Azure Content Understanding (CU) extracts structured data from documents like invoices, receipts, and forms using AI-powered analysis. Content Understanding endpoints are available inside Foundry projects, but require using a specific CU endpoint (not the project endpoint).
This guide covers:
- Setting up a Foundry project with CU access
- Calling the CU REST API with cURL
- Building a Python CLI tool using the Azure SDK
- Configuring custom model deployments (optional)
Content Understanding provides 80+ prebuilt analyzers for different use cases:
Common Analyzers:
prebuilt-layout- Layout analysis with tables, figures, and structure (recommended default)prebuilt-documentSearch- Document ingestion for RAG with layout, figures, charts, summariesprebuilt-invoice- Invoices, utility bills, sales ordersprebuilt-receipt- Sales receiptsprebuilt-idDocument- IDs, passports, driver licenses (worldwide)prebuilt-tax.us.w2- W-2 formsprebuilt-contract- Business contracts
Categories Available:
- Content extraction (OCR, layout)
- RAG analyzers (document/image/audio/video search)
- Financial documents (invoices, receipts, bank statements)
- Identity documents (IDs, passports, insurance cards)
- US tax documents (1040, W-2, 1099s, 1098s)
- US mortgage documents (1003, 1004, closing disclosures)
- Business & legal documents (contracts, purchase orders)
- And many more specialized analyzers
Full List: https://learn.microsoft.com/azure/ai-services/content-understanding/concepts/prebuilt-analyzers
Content Understanding is available at the Foundry resource level (e.g. in the default project).
- Go to Azure AI Foundry portal
- If you don't have a Foundry resource yet: Create one
- If you already have a Foundry resource: Use your default project
- Note your Project Endpoint:
https://<your-foundry-resource>.services.ai.azure.com/api/projects/<your-project>
Important: The CU endpoint is the base URL of your Foundry service, not the project endpoint:
Project Endpoint: https://<your-foundry-resource>.services.ai.azure.com/api/projects/<your-project>
CU Endpoint: https://<your-foundry-resource>.services.ai.azure.com β Use this for CU calls
All CU API calls use paths under /contentunderstanding/.
az login --tenant <your-tenant-id>
az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsvThis token is used in the Authorization: Bearer <token> header.
# Start analysis with prebuilt-layout analyzer (recommended default)
curl -i -X POST "https://<your-foundry-resource>.services.ai.azure.com/contentunderstanding/analyzers/prebuilt-layout:analyze?api-version=2025-11-01" \
-H "Authorization: Bearer $(az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv)" \
-H "Content-Type: application/json" \
-d '{
"inputs":[{"url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/invoice.pdf"}]
}'
# Or use a different analyzer - just change the analyzer name in the URL:
# ...analyzers/prebuilt-invoice:analyze...
# ...analyzers/prebuilt-documentSearch:analyze...
# ...analyzers/prebuilt-receipt:analyze...The response includes:
- Status:
202 Accepted Operation-Locationheader with the full URL to poll
# Extract the Operation-Location URL from the response headers
# Example: https://<your-foundry-resource>.services.ai.azure.com/contentunderstanding/analyzerResults/28bea330-d7e0-4159-b14c-9985df3fe4f4?api-version=2025-11-01
# Poll using that URL
curl -X GET "<operation-location-url>" \
-H "Authorization: Bearer $(az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv)"
# The operation-location URL already includes the api-version parameter, so use it as-isFor me, it's much easier to wrap this in a Python script. Here's how to do so using the azure-ai-projects SDK:
pip install azure-ai-projects azure-identity python-dotenv typerAZURE_AI_PROJECT_ENDPOINT=https://<your-foundry-resource>.services.ai.azure.com/api/projects/<your-project>
AZURE_CU_ENDPOINT=https://<your-foundry-resource>.services.ai.azure.com"""
Content Understanding example using azure-ai-projects SDK.
Usage:
python cu_example.py --help
python cu_example.py analyze https://example.com/document.pdf
python cu_example.py analyze --analyzer prebuilt-layout https://example.com/doc.pdf
"""
import os
import time
from typing import Dict, Any
from azure.ai.projects import AIProjectClient
from azure.core.rest import HttpRequest
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
import typer
# Load environment variables
load_dotenv()
# Create Typer app
app = typer.Typer(help="Content Understanding CLI tool")
def analyze_document(
project_client: AIProjectClient,
cu_endpoint: str,
analyzer_id: str,
file_url: str,
api_version: str = "2025-11-01"
) -> Dict[str, Any]:
"""
Analyze a document using Content Understanding.
Args:
project_client: Authenticated AIProjectClient
cu_endpoint: Content Understanding endpoint (base URL)
analyzer_id: Analyzer to use (e.g., "prebuilt-invoice")
file_url: URL of the document to analyze
api_version: API version
Returns:
Analysis result
"""
# Start analysis
analyze_url = f"{cu_endpoint}/contentunderstanding/analyzers/{analyzer_id}:analyze?api-version={api_version}"
request = HttpRequest(
method="POST",
url=analyze_url,
json={"inputs": [{"url": file_url}]}
)
response = project_client.send_request(request)
response.raise_for_status()
# Content Understanding returns results asynchronously
# The response contains an Operation-Location header with the URL to poll for results
operation_location = response.headers.get("Operation-Location")
# Poll the operation URL until analysis completes
while True:
request = HttpRequest(method="GET", url=operation_location)
response = project_client.send_request(request)
response.raise_for_status()
result = response.json()
status = result.get("status")
if status == "Succeeded":
return result
elif status == "Failed":
raise RuntimeError(f"Analysis failed: {result}")
elif status in ["Running", "NotStarted"]:
time.sleep(1)
else:
raise ValueError(f"Unknown status: {status}")
@app.command()
def analyze(
file_url: str = typer.Argument(..., help="URL of the document to analyze"),
analyzer: str = typer.Option("prebuilt-layout", "--analyzer", "-a", help="Analyzer to use"),
output: str = typer.Option(None, "--output", "-o", help="Save result to file (JSON)"),
output_md: str = typer.Option(None, "--output-md", "-m", help="Save markdown to file"),
show_full: bool = typer.Option(False, "--full", "-f", help="Show full markdown output"),
):
"""Analyze a document using Content Understanding."""
endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
cu_endpoint = os.environ.get("AZURE_CU_ENDPOINT", endpoint)
typer.echo(f"π Analyzing document with {analyzer}...")
with (
DefaultAzureCredential() as credential,
AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
):
result = analyze_document(
project_client=project_client,
cu_endpoint=cu_endpoint,
analyzer_id=analyzer,
file_url=file_url
)
status = result['status']
typer.echo(f"β Status: {status}")
if status == "Succeeded":
markdown = result['result']['contents'][0]['markdown']
# Save to file if specified
if output:
import json
with open(output, 'w') as f:
json.dump(result, f, indent=2)
typer.echo(f"πΎ Saved full result to: {output}")
# Save markdown to file if specified
if output_md:
with open(output_md, 'w') as f:
f.write(markdown)
typer.echo(f"π Saved markdown to: {output_md}")
# Show markdown
if show_full:
typer.echo(f"\nπ Extracted Markdown:\n{markdown}")
else:
typer.echo(f"\nπ Extracted Markdown (preview):\n{markdown[:500]}...")
typer.echo(f"\n(Use --full to see complete output)")
if __name__ == "__main__":
app()Note: This example uses Typer to provide a CLI interface. The core analyze_document() function can be used independently in your own scripts.
# Show help
python cu_example.py --help
# Analyze with default analyzer (prebuilt-layout)
python cu_example.py analyze https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/invoice.pdf
# Use different analyzers
python cu_example.py analyze --analyzer prebuilt-invoice https://example.com/invoice.pdf
python cu_example.py analyze --analyzer prebuilt-documentSearch https://example.com/report.pdf
python cu_example.py analyze --analyzer prebuilt-receipt https://example.com/receipt.jpg
# Save outputs to files
# to json
python cu_example.py analyze --output result.json https://example.com/invoice.pdf
# to markdown
python cu_example.py analyze --output-md result.md https://example.com/invoice.pdf
# to json AND markdown
python cu_example.py analyze -o result.json -m result.md https://example.com/invoice.pdf
# Show full markdown output (not just preview)
python cu_example.py analyze --full https://example.com/document.pdf
# Combine options
python cu_example.py analyze -a prebuilt-receipt -o receipt.json -m receipt.md --full https://example.com/receipt.jpgIf you want Content Understanding to use your own Foundry model deployments instead of Azure-managed models:
You need active deployments in your Foundry project for:
- GPT-4.1 (any deployment name)
- GPT-4.1-mini (any deployment name)
- text-embedding-3-large (any deployment name)
Important: The JSON values are your deployment names in Foundry, not the model names.
"""
One-time setup to configure Content Understanding model deployments.
Run this once - settings persist across all CU calls.
"""
import os
from azure.ai.projects import AIProjectClient
from azure.core.rest import HttpRequest
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
load_dotenv()
endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
cu_endpoint = os.environ["AZURE_CU_ENDPOINT"]
# Replace these with YOUR deployment names from Foundry
DEPLOYMENTS = {
"gpt-4.1": "gpt-4.1", # e.g., "gpt-4.1" or "my-gpt4-prod"
"gpt-4.1-mini": "gpt-4.1-mini", # e.g., "gpt-4.1-mini" or "gpt4-mini-v2"
"text-embedding-3-large": "text-embedding-3-large" # e.g., "text-embedding-3-large"
}
with (
DefaultAzureCredential() as credential,
AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
):
url = f"{cu_endpoint}/contentunderstanding/defaults?api-version=2025-11-01"
request = HttpRequest(
method="PATCH",
url=url,
json={"modelDeployments": DEPLOYMENTS}
)
response = project_client.send_request(request)
response.raise_for_status()
print(f"β Configured deployments: {DEPLOYMENTS}")Examples of deployment names:
- Simple:
"gpt-4.1"(matches model name) - With suffix:
"gpt-4.1-prod","gpt-4.1-v2" - Custom:
"my-reasoning-model","prod-gpt4"
β Using project endpoint for CU calls
# Wrong - this won't work
cu_endpoint = "https://<your-foundry-resource>.services.ai.azure.com/api/projects/<your-project>"β Using base URL for CU calls
# Correct
cu_endpoint = "https://<your-foundry-resource>.services.ai.azure.com"β Using API key instead of bearer token
- Content Understanding requires bearer token authentication via Azure credentials
β Using DefaultAzureCredential
- Works automatically with
az loginor managed identity