Skip to content

Instantly share code, notes, and snippets.

@jordotech
Created February 25, 2026 16:20
Show Gist options
  • Select an option

  • Save jordotech/54c98410fd194fb28ca7db7859fdb9cb to your computer and use it in GitHub Desktop.

Select an option

Save jordotech/54c98410fd194fb28ca7db7859fdb9cb to your computer and use it in GitHub Desktop.

Platform-API PR Comparison: #365 vs #363

PR #365 (sso_user_id_updates by CapitolCoder) — 3,143 additions, 80 deletions, 16 files PR #363 (feature/COM-18-ad-phase-1 by jordotech) — 3,124 additions, 264 deletions, 22 files

Both PRs target the same goal: make platform-api Auth0-aware so SSO users get proper UUIDs instead of Auth0 sub strings.

Scope Comparison

Aspect PR #365 PR #363
Base branch feature/COM-18-ad-phase-1 (depends on #363) develop (standalone)
Auth discovery endpoint Not included POST /public/auth/discover (new)
Auth0 JWKS middleware Not included (inherits from #363) Added RS256 verification in check_org_access
/organizations/me Bearer support Not included (inherits from #363) Added Auth0 user from request.state.auth0_user
/validate + /check Auth0 detection Yes — rewrites these endpoints Yes — already implemented
User UUID resolution New utils/user_resolution.py utility Inline in _validate_token_impl()
clj-pg-wrapper endpoint used GET /admin/user/{email} (read-only lookup) + POST /api/v1/users/by-email (get-or-create) POST /api/v1/users/ensure (get-or-create)
JIT provisioning enhancement Adds clj-pg-wrapper user creation in /organizations/me Not needed — /users/ensure is called from /validate
AUTH0_TEST_MODE New env var + HS256 test token support Not needed
Shell-based e2e tests 420-line sso_e2e.just with bash scripts Not included
New Python files 4 (user_resolution.py, 3 test files, 4 scripts) 3 (auth_discover.py, schemas/auth_discover.py, 1 test file)

Key Architectural Difference

PR #363 (ours): UUID resolution happens at /validate time via POST /users/ensure. Every Auth0 token validation creates-or-returns the user. Simple, single call site.

PR #365: Splits resolution into two separate utilities:

  • lookup_user_uuid_by_email() — read-only GET /admin/user/{email}, used in get_current_user(), check_org_access(), and /validate
  • get_or_create_user_by_email()POST /api/v1/users/by-email, used only in /organizations/me JIT

This means a first-time SSO user hitting /validate gets a 401 ("User not provisioned yet") because the read-only lookup returns None. They must hit /organizations/me first to trigger JIT creation. This creates an ordering dependency in the auth flow.

Issues with PR #365

  1. First-time SSO user gets 401 on /validate — The read-only lookup_user_uuid_by_email() returns None for new users, and the endpoint returns 401 "User not provisioned yet" instead of creating the user. The agentic-backend calls /validate first — SSO users would be blocked until they visit the frontend and trigger /organizations/me JIT.

  2. Depends on PR #363 — Base branch is feature/COM-18-ad-phase-1, meaning it layers on top of our PR. It can't be merged independently.

  3. Depends on PR #26's endpointuser_resolution.py calls POST /api/v1/users/by-email (from PR #26 in clj-pg-wrapper), not our /users/ensure. These endpoints have different semantics (email_verified=True vs False, EmailStr validation vs plain string).

  4. Rewrites code that already worksauth.py is rewritten from scratch, replacing our working _validate_token_impl() with _handle_token_validation(). The new version is async but introduces the 401 regression for new users.

  5. AUTH0_TEST_MODE is a security surface — Adds an env var that accepts HS256 self-signed tokens when enabled. While useful for e2e, this pattern risks accidental enablement in production.

  6. Dual-header forwarding in /validate — Adds Authorization: Bearer header parsing directly in the /validate and /check endpoints. This duplicates what agentic-backend already does (it forwards via X-User-Token). The existing flow works: agentic-backend extracts the Bearer token and sends it as X-User-Token.

  7. Over-engineered test infrastructure — 420-line bash e2e justfile, 4 Python scripts, 1,063-line e2e test file, 946-line SSO auth test file. Total: ~2,800 lines of test code for ~300 lines of implementation.

  8. verify_auth0_user_token returns user_id=None — Changes the existing function to return None for user_id instead of the Auth0 sub. This breaks the existing contract — any code that relied on user_id being non-None after successful verification now fails.

What PR #365 Gets Right

  • The separation of read-only lookup vs get-or-create is conceptually clean
  • JIT user creation in /organizations/me alongside DynamoDB membership is a good idea
  • The e2e test coverage (if it worked) would be valuable
  • The capitol_user_id JWT claim future-proofing is forward-thinking

Recommendation

PR #363 should be the base. It's already merged-ready with working auth discovery, JWKS middleware, Bearer token support, and UUID resolution via /users/ensure. PR #365 layers on top of it but introduces a 401 regression for first-time SSO users and requires PR #26's /by-email endpoint.

If specific ideas from #365 are valuable (JIT user creation in /organizations/me, test infrastructure), they can be cherry-picked as follow-ups after #363 lands.

@CapitolCoder
Copy link

1. First-time SSO user gets 401 on /validate — The read-only lookup_user_uuid_by_email() returns None for new users, and the endpoint returns 401 "User not provisioned yet" instead of creating the user. The agentic-backend calls /validate first — SSO users would be blocked until they visit the frontend and trigger /organizations/me JIT.
Did this intentionally so that the member creation in DynamoDB and user creation in clj pg wrapper are in sync during JIT.
Reduces chances of data mismatch where user exists in clj but not in dynamo and vice versa

2.AUTH0_TEST_MODE is a security surface — Adds an env var that accepts HS256 self-signed tokens when enabled. While useful for e2e, this pattern risks accidental enablement in production.
This is a risk and did not like adding it either, but was the quickest to run end to end tests locally and can/should be removed

3.verify_auth0_user_token returns user_id=None — Changes the existing function to return None for user_id instead of the Auth0 sub. This breaks the existing contract — any code that relied on user_id being non-None after successful verification now fails.
This is very intentional, we do not want to use the auth zero sub field as the user id anywhere in platform api code or other services, even as a fallback.
This prevents agentic-backend from using the auth zero sub as user uuid in its database and avoids data corruption.
We should only use the canonical user uuid from clj in all our services.

@jordotech
Copy link
Author

Good points. Addressing each:

1. JIT ordering / 401 on /validate

The data sync concern is valid, but the 401 creates a real UX problem: agentic-backend calls /validate (not /organizations/me) as its auth path. A first-time SSO user opening the workflow builder would get a 401 before they ever hit /organizations/me. The frontend flow happens to hit /organizations/me first, but agentic-backend doesn't follow that sequence.

Our approach — get-or-create at /validate time via POST /users/ensure — creates the Postgres user eagerly. The DynamoDB membership is still created separately during JIT in /organizations/me. These don't need to be atomic: a user existing in Postgres without a DynamoDB membership is a safe state (they just won't see the org yet). The reverse (DynamoDB membership without Postgres user) can't happen because /organizations/me runs after /validate.

2. AUTH0_TEST_MODE

Agreed — makes sense for local e2e but should be gated carefully. No disagreement here.

3. Auth0 sub fallback → data corruption risk

This is a valid catch. Our code had a fallback where if clj-pg-wrapper was unreachable, we'd use the Auth0 sub (e.g. waad|abc123) as userId. That would let it leak into agentic-backend's database.

Fixed: we now return 502 instead of falling back to the Auth0 sub. If clj-pg-wrapper can't resolve the UUID, the request fails rather than returning a corrupted identity. Pushed as part of the next commit.

Thanks for flagging #3 — that was a real bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment