PR #365 (sso_user_id_updates by CapitolCoder) — 3,143 additions, 80 deletions, 16 files
PR #363 (feature/COM-18-ad-phase-1 by jordotech) — 3,124 additions, 264 deletions, 22 files
Both PRs target the same goal: make platform-api Auth0-aware so SSO users get proper UUIDs instead of Auth0 sub strings.
| Aspect | PR #365 | PR #363 |
|---|---|---|
| Base branch | feature/COM-18-ad-phase-1 (depends on #363) |
develop (standalone) |
| Auth discovery endpoint | Not included | POST /public/auth/discover (new) |
| Auth0 JWKS middleware | Not included (inherits from #363) | Added RS256 verification in check_org_access |
/organizations/me Bearer support |
Not included (inherits from #363) | Added Auth0 user from request.state.auth0_user |
/validate + /check Auth0 detection |
Yes — rewrites these endpoints | Yes — already implemented |
| User UUID resolution | New utils/user_resolution.py utility |
Inline in _validate_token_impl() |
| clj-pg-wrapper endpoint used | GET /admin/user/{email} (read-only lookup) + POST /api/v1/users/by-email (get-or-create) |
POST /api/v1/users/ensure (get-or-create) |
| JIT provisioning enhancement | Adds clj-pg-wrapper user creation in /organizations/me |
Not needed — /users/ensure is called from /validate |
AUTH0_TEST_MODE |
New env var + HS256 test token support | Not needed |
| Shell-based e2e tests | 420-line sso_e2e.just with bash scripts |
Not included |
| New Python files | 4 (user_resolution.py, 3 test files, 4 scripts) |
3 (auth_discover.py, schemas/auth_discover.py, 1 test file) |
PR #363 (ours): UUID resolution happens at /validate time via POST /users/ensure. Every Auth0 token validation creates-or-returns the user. Simple, single call site.
PR #365: Splits resolution into two separate utilities:
lookup_user_uuid_by_email()— read-onlyGET /admin/user/{email}, used inget_current_user(),check_org_access(), and/validateget_or_create_user_by_email()—POST /api/v1/users/by-email, used only in/organizations/meJIT
This means a first-time SSO user hitting /validate gets a 401 ("User not provisioned yet") because the read-only lookup returns None. They must hit /organizations/me first to trigger JIT creation. This creates an ordering dependency in the auth flow.
-
First-time SSO user gets 401 on
/validate— The read-onlylookup_user_uuid_by_email()returnsNonefor new users, and the endpoint returns401 "User not provisioned yet"instead of creating the user. The agentic-backend calls/validatefirst — SSO users would be blocked until they visit the frontend and trigger/organizations/meJIT. -
Depends on PR #363 — Base branch is
feature/COM-18-ad-phase-1, meaning it layers on top of our PR. It can't be merged independently. -
Depends on PR #26's endpoint —
user_resolution.pycallsPOST /api/v1/users/by-email(from PR #26 in clj-pg-wrapper), not our/users/ensure. These endpoints have different semantics (email_verified=TruevsFalse,EmailStrvalidation vs plain string). -
Rewrites code that already works —
auth.pyis rewritten from scratch, replacing our working_validate_token_impl()with_handle_token_validation(). The new version is async but introduces the 401 regression for new users. -
AUTH0_TEST_MODEis a security surface — Adds an env var that accepts HS256 self-signed tokens when enabled. While useful for e2e, this pattern risks accidental enablement in production. -
Dual-header forwarding in
/validate— AddsAuthorization: Bearerheader parsing directly in the/validateand/checkendpoints. This duplicates what agentic-backend already does (it forwards viaX-User-Token). The existing flow works: agentic-backend extracts the Bearer token and sends it asX-User-Token. -
Over-engineered test infrastructure — 420-line bash e2e justfile, 4 Python scripts, 1,063-line e2e test file, 946-line SSO auth test file. Total: ~2,800 lines of test code for ~300 lines of implementation.
-
verify_auth0_user_tokenreturnsuser_id=None— Changes the existing function to returnNoneforuser_idinstead of the Auth0sub. This breaks the existing contract — any code that relied onuser_idbeing non-None after successful verification now fails.
- The separation of read-only lookup vs get-or-create is conceptually clean
- JIT user creation in
/organizations/mealongside DynamoDB membership is a good idea - The e2e test coverage (if it worked) would be valuable
- The
capitol_user_idJWT claim future-proofing is forward-thinking
PR #363 should be the base. It's already merged-ready with working auth discovery, JWKS middleware, Bearer token support, and UUID resolution via /users/ensure. PR #365 layers on top of it but introduces a 401 regression for first-time SSO users and requires PR #26's /by-email endpoint.
If specific ideas from #365 are valuable (JIT user creation in /organizations/me, test infrastructure), they can be cherry-picked as follow-ups after #363 lands.
Good points. Addressing each:
1. JIT ordering / 401 on
/validateThe data sync concern is valid, but the 401 creates a real UX problem: agentic-backend calls
/validate(not/organizations/me) as its auth path. A first-time SSO user opening the workflow builder would get a 401 before they ever hit/organizations/me. The frontend flow happens to hit/organizations/mefirst, but agentic-backend doesn't follow that sequence.Our approach — get-or-create at
/validatetime viaPOST /users/ensure— creates the Postgres user eagerly. The DynamoDB membership is still created separately during JIT in/organizations/me. These don't need to be atomic: a user existing in Postgres without a DynamoDB membership is a safe state (they just won't see the org yet). The reverse (DynamoDB membership without Postgres user) can't happen because/organizations/meruns after/validate.2. AUTH0_TEST_MODE
Agreed — makes sense for local e2e but should be gated carefully. No disagreement here.
3. Auth0 sub fallback → data corruption risk
This is a valid catch. Our code had a fallback where if clj-pg-wrapper was unreachable, we'd use the Auth0
sub(e.g.waad|abc123) asuserId. That would let it leak into agentic-backend's database.Fixed: we now return 502 instead of falling back to the Auth0 sub. If clj-pg-wrapper can't resolve the UUID, the request fails rather than returning a corrupted identity. Pushed as part of the next commit.
Thanks for flagging #3 — that was a real bug.