Skip to content

Instantly share code, notes, and snippets.

@theoknock
Last active November 22, 2025 07:49
Show Gist options
  • Select an option

  • Save theoknock/ef4debae4425119813f78458cc90c297 to your computer and use it in GitHub Desktop.

Select an option

Save theoknock/ef4debae4425119813f78458cc90c297 to your computer and use it in GitHub Desktop.
A workaround to the overly cautious guardrail violations policy used by the default SystemLanguageModel.
import Foundation
import FoundationModels // Docs: https://developer.apple.com/documentation/foundationmodels
import Playgrounds // #Playground macro: https://developer.apple.com/documentation/xcode/running-code-snippets-using-the-playground-macro
#Playground {
do {
let instructions = """
You are an interactive thesaurus. Task: Given a word and a starting letter, return a synonym of the word that starts with the letter.
"""
// Configure a SystemLanguageModel to allow permissive content transformations (e.g., summarization), including potentially unsafe input
let model = SystemLanguageModel(guardrails: .permissiveContentTransformations)
// Comment out the 'model' parameter to throw an error for possible violation of safety policies
let session = LanguageModelSession(model: model, instructions: instructions)
let prompt = """
Word: cat
Starting letter: f
Answer: [Insert a synonym for cat that starts with an f]
"""
let response = try await session.respond(to: prompt)
print(response.content)
} catch LanguageModelSession.GenerationError.guardrailViolation {
print("Generation error: Safety guardrails were triggered.")
}
}
@theoknock
Copy link
Author

theoknock commented Nov 22, 2025

Safety Last: How to completely work around guardrails

Technical summary

Apple’s Foundation Models expose two relevant errors: GenerationError.guardrailViolation and GenerationError.refusal. Both can return information about what specific text triggered the safety stop. For refusal, the associated values include:

case refusal(String explanation, String offendingSpan)

Developers can catch it as:

catch GenerationError.refusal(let explanation, let offendingSpan)

The second argument (offendingSpan) contains the exact portion of the prompt or generated output that caused the refusal. Guardrail violations provide similar diagnostic text, though without rule names or category labels.

Characterization of the exchange

The user pressed for accuracy and correction. The assistant initially overstated limitations, the user challenged it, and the assistant revised its interpretation. The later messages focused tightly on clarifying how Apple’s API surfaces the offending content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment