Last active
November 22, 2025 07:49
-
-
Save theoknock/ef4debae4425119813f78458cc90c297 to your computer and use it in GitHub Desktop.
A workaround to the overly cautious guardrail violations policy used by the default SystemLanguageModel.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import Foundation | |
| import FoundationModels // Docs: https://developer.apple.com/documentation/foundationmodels | |
| import Playgrounds // #Playground macro: https://developer.apple.com/documentation/xcode/running-code-snippets-using-the-playground-macro | |
| #Playground { | |
| do { | |
| let instructions = """ | |
| You are an interactive thesaurus. Task: Given a word and a starting letter, return a synonym of the word that starts with the letter. | |
| """ | |
| // Configure a SystemLanguageModel to allow permissive content transformations (e.g., summarization), including potentially unsafe input | |
| let model = SystemLanguageModel(guardrails: .permissiveContentTransformations) | |
| // Comment out the 'model' parameter to throw an error for possible violation of safety policies | |
| let session = LanguageModelSession(model: model, instructions: instructions) | |
| let prompt = """ | |
| Word: cat | |
| Starting letter: f | |
| Answer: [Insert a synonym for cat that starts with an f] | |
| """ | |
| let response = try await session.respond(to: prompt) | |
| print(response.content) | |
| } catch LanguageModelSession.GenerationError.guardrailViolation { | |
| print("Generation error: Safety guardrails were triggered.") | |
| } | |
| } |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Safety Last: How to completely work around guardrails
Technical summary
Apple’s Foundation Models expose two relevant errors: GenerationError.guardrailViolation and GenerationError.refusal. Both can return information about what specific text triggered the safety stop. For refusal, the associated values include:
Developers can catch it as:
The second argument (offendingSpan) contains the exact portion of the prompt or generated output that caused the refusal. Guardrail violations provide similar diagnostic text, though without rule names or category labels.
Characterization of the exchange
The user pressed for accuracy and correction. The assistant initially overstated limitations, the user challenged it, and the assistant revised its interpretation. The later messages focused tightly on clarifying how Apple’s API surfaces the offending content.