Skip to content

Instantly share code, notes, and snippets.

@jordotech
Created February 25, 2026 16:20
Show Gist options
  • Select an option

  • Save jordotech/54c98410fd194fb28ca7db7859fdb9cb to your computer and use it in GitHub Desktop.

Select an option

Save jordotech/54c98410fd194fb28ca7db7859fdb9cb to your computer and use it in GitHub Desktop.

Platform-API PR Comparison: #365 vs #363

PR #365 (sso_user_id_updates by CapitolCoder) — 3,143 additions, 80 deletions, 16 files PR #363 (feature/COM-18-ad-phase-1 by jordotech) — 3,124 additions, 264 deletions, 22 files

Both PRs target the same goal: make platform-api Auth0-aware so SSO users get proper UUIDs instead of Auth0 sub strings.

Scope Comparison

Aspect PR #365 PR #363
Base branch feature/COM-18-ad-phase-1 (depends on #363) develop (standalone)
Auth discovery endpoint Not included POST /public/auth/discover (new)
Auth0 JWKS middleware Not included (inherits from #363) Added RS256 verification in check_org_access
/organizations/me Bearer support Not included (inherits from #363) Added Auth0 user from request.state.auth0_user
/validate + /check Auth0 detection Yes — rewrites these endpoints Yes — already implemented
User UUID resolution New utils/user_resolution.py utility Inline in _validate_token_impl()
clj-pg-wrapper endpoint used GET /admin/user/{email} (read-only lookup) + POST /api/v1/users/by-email (get-or-create) POST /api/v1/users/ensure (get-or-create)
JIT provisioning enhancement Adds clj-pg-wrapper user creation in /organizations/me Not needed — /users/ensure is called from /validate
AUTH0_TEST_MODE New env var + HS256 test token support Not needed
Shell-based e2e tests 420-line sso_e2e.just with bash scripts Not included
New Python files 4 (user_resolution.py, 3 test files, 4 scripts) 3 (auth_discover.py, schemas/auth_discover.py, 1 test file)

Key Architectural Difference

PR #363 (ours): UUID resolution happens at /validate time via POST /users/ensure. Every Auth0 token validation creates-or-returns the user. Simple, single call site.

PR #365: Splits resolution into two separate utilities:

  • lookup_user_uuid_by_email() — read-only GET /admin/user/{email}, used in get_current_user(), check_org_access(), and /validate
  • get_or_create_user_by_email()POST /api/v1/users/by-email, used only in /organizations/me JIT

This means a first-time SSO user hitting /validate gets a 401 ("User not provisioned yet") because the read-only lookup returns None. They must hit /organizations/me first to trigger JIT creation. This creates an ordering dependency in the auth flow.

Issues with PR #365

  1. First-time SSO user gets 401 on /validate — The read-only lookup_user_uuid_by_email() returns None for new users, and the endpoint returns 401 "User not provisioned yet" instead of creating the user. The agentic-backend calls /validate first — SSO users would be blocked until they visit the frontend and trigger /organizations/me JIT.

  2. Depends on PR #363 — Base branch is feature/COM-18-ad-phase-1, meaning it layers on top of our PR. It can't be merged independently.

  3. Depends on PR #26's endpointuser_resolution.py calls POST /api/v1/users/by-email (from PR #26 in clj-pg-wrapper), not our /users/ensure. These endpoints have different semantics (email_verified=True vs False, EmailStr validation vs plain string).

  4. Rewrites code that already worksauth.py is rewritten from scratch, replacing our working _validate_token_impl() with _handle_token_validation(). The new version is async but introduces the 401 regression for new users.

  5. AUTH0_TEST_MODE is a security surface — Adds an env var that accepts HS256 self-signed tokens when enabled. While useful for e2e, this pattern risks accidental enablement in production.

  6. Dual-header forwarding in /validate — Adds Authorization: Bearer header parsing directly in the /validate and /check endpoints. This duplicates what agentic-backend already does (it forwards via X-User-Token). The existing flow works: agentic-backend extracts the Bearer token and sends it as X-User-Token.

  7. Over-engineered test infrastructure — 420-line bash e2e justfile, 4 Python scripts, 1,063-line e2e test file, 946-line SSO auth test file. Total: ~2,800 lines of test code for ~300 lines of implementation.

  8. verify_auth0_user_token returns user_id=None — Changes the existing function to return None for user_id instead of the Auth0 sub. This breaks the existing contract — any code that relied on user_id being non-None after successful verification now fails.

What PR #365 Gets Right

  • The separation of read-only lookup vs get-or-create is conceptually clean
  • JIT user creation in /organizations/me alongside DynamoDB membership is a good idea
  • The e2e test coverage (if it worked) would be valuable
  • The capitol_user_id JWT claim future-proofing is forward-thinking

Recommendation

PR #363 should be the base. It's already merged-ready with working auth discovery, JWKS middleware, Bearer token support, and UUID resolution via /users/ensure. PR #365 layers on top of it but introduces a 401 regression for first-time SSO users and requires PR #26's /by-email endpoint.

If specific ideas from #365 are valuable (JIT user creation in /organizations/me, test infrastructure), they can be cherry-picked as follow-ups after #363 lands.

@jordotech
Copy link
Author

Good points. Addressing each:

1. JIT ordering / 401 on /validate

The data sync concern is valid, but the 401 creates a real UX problem: agentic-backend calls /validate (not /organizations/me) as its auth path. A first-time SSO user opening the workflow builder would get a 401 before they ever hit /organizations/me. The frontend flow happens to hit /organizations/me first, but agentic-backend doesn't follow that sequence.

Our approach — get-or-create at /validate time via POST /users/ensure — creates the Postgres user eagerly. The DynamoDB membership is still created separately during JIT in /organizations/me. These don't need to be atomic: a user existing in Postgres without a DynamoDB membership is a safe state (they just won't see the org yet). The reverse (DynamoDB membership without Postgres user) can't happen because /organizations/me runs after /validate.

2. AUTH0_TEST_MODE

Agreed — makes sense for local e2e but should be gated carefully. No disagreement here.

3. Auth0 sub fallback → data corruption risk

This is a valid catch. Our code had a fallback where if clj-pg-wrapper was unreachable, we'd use the Auth0 sub (e.g. waad|abc123) as userId. That would let it leak into agentic-backend's database.

Fixed: we now return 502 instead of falling back to the Auth0 sub. If clj-pg-wrapper can't resolve the UUID, the request fails rather than returning a corrupted identity. Pushed as part of the next commit.

Thanks for flagging #3 — that was a real bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment