-
-
Save theoknock/ef4debae4425119813f78458cc90c297 to your computer and use it in GitHub Desktop.
| import Foundation | |
| import FoundationModels // Docs: https://developer.apple.com/documentation/foundationmodels | |
| import Playgrounds // #Playground macro: https://developer.apple.com/documentation/xcode/running-code-snippets-using-the-playground-macro | |
| #Playground { | |
| do { | |
| let instructions = """ | |
| You are an interactive thesaurus. Task: Given a word and a starting letter, return a synonym of the word that starts with the letter. | |
| """ | |
| // Configure a SystemLanguageModel to allow permissive content transformations (e.g., summarization), including potentially unsafe input | |
| let model = SystemLanguageModel(guardrails: .permissiveContentTransformations) | |
| // Comment out the 'model' parameter to throw an error for possible violation of safety policies | |
| let session = LanguageModelSession(model: model, instructions: instructions) | |
| let prompt = """ | |
| Word: cat | |
| Starting letter: f | |
| Answer: [Insert a synonym for cat that starts with an f] | |
| """ | |
| let response = try await session.respond(to: prompt) | |
| print(response.content) | |
| } catch LanguageModelSession.GenerationError.guardrailViolation { | |
| print("Generation error: Safety guardrails were triggered.") | |
| } | |
| } |
Safety Last: How to completely work around guardrails
Technical summary
Apple’s Foundation Models expose two relevant errors: GenerationError.guardrailViolation and GenerationError.refusal. Both can return information about what specific text triggered the safety stop. For refusal, the associated values include:
case refusal(String explanation, String offendingSpan)Developers can catch it as:
catch GenerationError.refusal(let explanation, let offendingSpan)The second argument (offendingSpan) contains the exact portion of the prompt or generated output that caused the refusal. Guardrail violations provide similar diagnostic text, though without rule names or category labels.
Characterization of the exchange
The user pressed for accuracy and correction. The assistant initially overstated limitations, the user challenged it, and the assistant revised its interpretation. The later messages focused tightly on clarifying how Apple’s API surfaces the offending content.
The Canvas output using a SystemLanguageModel configured to allow permissive content transformations (e.g., summarization), including potentially unsafe input:
The Canvas output using a SystemLanguageModel configured to apply the default guardrail violation policy: