Skip to content

Instantly share code, notes, and snippets.

@drewr
Created March 9, 2026 21:32
Show Gist options
  • Select an option

  • Save drewr/5c250435512b7e91eaa62b850333d17e to your computer and use it in GitHub Desktop.

Select an option

Save drewr/5c250435512b7e91eaa62b850333d17e to your computer and use it in GitHub Desktop.
id title status created author
ops-001
Manual Signup Defense Workflow
draft
2026-03-09
architect

Manual Signup Defense Workflow

Overview

Datum Cloud is a SaaS infrastructure platform that currently does not require identity verification or credit card collection at signup. This creates an asymmetric risk profile: the cost of abuse (compute, network egress, support burden, reputation) is borne entirely by Datum, while the cost of attempting abuse approaches zero for bad actors.

This document defines a practical, manually-executable defense workflow for a small ops/security team. It covers every stage of the user lifecycle: before a signup attempt reaches the form, the moment a signup event fires, the hours and days that follow, and the escalation path when abuse is confirmed.

The workflow is designed to be executable today without automated tooling, while identifying where automation will produce the highest leverage as the team scales.


Threat Model

Before defining controls, it is important to name the threats this workflow is designed to address. Controls should be proportionate to the threat.

Threat Description Primary Harm
Spam account farms Bulk signups to harvest free-tier resources Compute/egress cost
Credential stuffing launch pads Accounts used to proxy attacks on third parties Reputation, legal
Fraudulent trial abuse Repeated signups to reset free-tier limits Revenue leakage
Competitive intelligence scraping Bulk signups to probe API surface and pricing Business harm
Lateral pivoting from compromised IdPs Accounts authenticated via stolen Google/GitHub tokens Data integrity
Insider/supply-chain staging Accounts created to establish long-term persistence Security posture

This workflow primarily addresses the first four. The last two require separate controls (IdP security policies, session monitoring) that are out of scope here.


Requirements

Functional Requirements

  • FR1: Provide friction-without-CAPTCHA gates at the signup form that deter automated signups without penalizing legitimate users.
  • FR2: Classify every new signup as low-risk, medium-risk, or high-risk within minutes of account creation using observable signals, without requiring human review of every signup.
  • FR3: Provide a daily triage process that a single ops team member can complete in under 30 minutes for normal volume.
  • FR4: Define clear, documented escalation criteria so any team member can initiate an escalation without judgment calls.
  • FR5: Provide an offboarding procedure that terminates abusive accounts cleanly and preserves evidence for potential legal action.
  • FR6: Track all review decisions and actions in a durable log.

Non-Functional Requirements

  • NFR1: The workflow must not require changes to the Datum platform API or database schema to operate.
  • NFR2: False-positive rate on high-risk classification must be kept low enough that the team can manually review every flagged signup without backlog.
  • NFR3: Time from confirmed abuse to account suspension must be under 4 hours during business hours, under 8 hours outside business hours.
  • NFR4: All data collected during review must comply with applicable privacy regulations (GDPR, CCPA). Retain only what is necessary.

Design

Stage 0: Pre-Signup Gates

Pre-signup gates are controls applied before an account is created. Their purpose is to raise the cost of automated and bulk signups without creating friction for genuine users.

Gate 0-A: Email Domain Filtering

Maintain a block list of known disposable email providers (e.g., Mailinator, Guerrilla Mail, 10minutemail, and their known variants). Check the signup email domain against this list at form submission time.

Signals to collect:

  • Email domain age (use WHOIS; domains under 30 days old are high-risk)
  • MX record existence (no MX record = likely not a real company domain)
  • Whether the domain is a free consumer provider (gmail.com, yahoo.com, etc.) — not a block, but a risk signal to carry forward
  • Whether the email appears in known breach datasets (HaveIBeenPwned API)

Decision at this gate:

  • Disposable domain: hard block, return a user-visible error.
  • Domain with no MX: soft block, require manual approval before access is granted.
  • Free consumer domain: allow, but increment risk score.
  • Breach hit: allow, but flag for post-signup review.

Tools: disposable-email-domains open-source list (GitHub: disposable-email-domains), DNS lookup, WHOIS API, HaveIBeenPwned API (free tier).

Gate 0-B: IP Reputation Check

At the moment the signup form is submitted, collect the originating IP address and check it against freely available reputation data.

Signals to collect:

  • Is the IP a known Tor exit node? (dan.me.uk/torlist or similar)
  • Is the IP a known hosting/datacenter range? (e.g., AWS, DigitalOcean, Hetzner IP blocks) — not a hard block, but a strong signal that the signup is automated.
  • Is the IP on a public blocklist (Spamhaus DROP/EDROP)?
  • Geographic origin relative to the browser's Accept-Language header (mismatch is a signal, not a block).

Decision at this gate:

  • Tor exit node: require email verification before any resource creation is permitted. Log heavily.
  • Datacenter IP range: allow signup, but flag account as high-risk pending post-signup review.
  • Spamhaus blocklist: soft block, surface a "contact us" CTA rather than a form.
  • Geo/language mismatch alone: risk signal only.

Tools: ip-api.com (free tier for low volume), AbuseIPDB (free tier), MaxMind GeoLite2 (free, self-hosted).

Gate 0-C: Signup Rate Limiting

Apply a per-IP and per-email-domain rate limit on signup form submissions.

Thresholds (adjust based on observed baseline):

  • Per IP: no more than 3 signup attempts per hour.
  • Per email domain: no more than 5 new accounts per hour from the same corporate domain (allows legitimate company onboarding without enabling domain-based abuse).
  • Global: if the platform-wide signup rate exceeds 2x the 7-day rolling average, trigger an ops alert and temporarily enable a manual review queue for all new signups.

Tools: Cloudflare rate limiting rules (if Datum uses Cloudflare), or equivalent edge/CDN rule. Can be implemented at the application layer if no CDN is in place.

Gate 0-D: IdP-Level Signals

Datum Cloud authenticates users through trusted Identity Providers (Google, GitHub, and enterprise IdPs via OIDC/SAML per the IAM Authentication enhancement). This provides several signals without any additional verification burden on the user.

Signals to collect from the IdP token at account creation time:

  • Account age of the IdP account (GitHub: account creation date via API; Google: email age is not directly available, use account creation heuristics).
  • Whether the GitHub account has zero public activity (repos, stars, PRs) — a strong signal of a throwaway account.
  • Whether the Google account has no profile photo and a recently-created email format (e.g., randomstring123@gmail.com).
  • Whether the IdP account was created within the last 7 days.

Decision at this gate:

  • GitHub account under 7 days old with zero activity: flag as high-risk.
  • Any IdP account created in the last 48 hours: require email verification of a secondary address before resource creation.

Tools: GitHub API (unauthenticated for public account metadata), Google People API (if the user grants profile scope during OAuth).


Stage 1: Signup Event Handling

This stage covers the actions taken immediately after a new account is successfully created. The goal is to capture a risk snapshot before the user takes any meaningful action.

Step 1-A: Emit a Structured Signup Event

At the moment of account creation, emit a structured event to the ops team's monitoring channel (Slack, PagerDuty, or equivalent). The event should include:

NEW SIGNUP
  user_id:         <datum user id>
  email:           <email address>
  email_domain:    <domain only>
  idp:             google | github | enterprise
  idp_account_age: <days, if available>
  signup_ip:       <ip address>
  ip_datacenter:   true | false
  ip_tor:          true | false
  risk_score:      low | medium | high
  timestamp:       <UTC ISO8601>

The risk score is computed from the pre-signup gate signals above. A simple additive rubric works at small scale:

Signal Points
Free consumer email domain +1
Email in breach dataset +1
No MX record on domain +2
Domain under 30 days old +2
Datacenter IP range +2
Tor exit node +4
IdP account under 7 days old +3
IdP account with zero activity +2
IP on abuse blocklist +3

Score thresholds:

  • 0-2: low risk. No immediate action. Add to daily review batch.
  • 3-5: medium risk. Send welcome email with extended verification prompt. Review within 24 hours.
  • 6+: high risk. Restrict account to read-only access immediately. Assign to a team member for same-day manual review.

Step 1-B: Send Welcome Email and Verify Deliverability

Send the standard welcome email immediately after account creation. Two things to observe:

  1. Email bounce: if the welcome email bounces (hard bounce = invalid address), immediately suspend the account and log it. This is a near-certain indicator of a fake signup.
  2. Email open: if the welcome email is not opened within 48 hours, increment the risk score by 1 and add to the next daily review batch.

Tools: Resend, Postmark, or SendGrid all provide delivery and bounce webhooks. Wire these webhooks to update the account's risk record.

Step 1-C: Resource Creation Hold for High-Risk Accounts

For accounts that score 6 or higher at signup, apply a resource creation hold. This means the account can log in and explore the console/docs, but cannot create billable resources (compute instances, network endpoints, DNS zones) until a team member manually approves the account or the account completes an email verification challenge.

This is the single most effective control available without a payment method: it eliminates the primary harm (resource abuse) while preserving the ability for the user to evaluate the platform.

Implementation note: in the current Datum IAM model, this can be implemented as an IAM policy binding that withholds create permissions on Organization and Project resources until a flag is cleared.


Stage 2: Post-Signup Monitoring

Post-signup monitoring covers the period from account creation through the first 30 days. The team reviews two categories of signals: behavioral signals collected in real-time, and batch signals reviewed daily.

Real-Time Behavioral Signals (Alert Immediately)

These signals indicate that an account is likely actively abusing the platform right now. They should trigger an immediate Slack/PagerDuty alert.

Signal Threshold Action
API request rate >500 API calls/minute from a single account Throttle + alert
Resource creation burst >10 resources created within 5 minutes of account creation Suspend + alert
Geographic impossibility API calls from two continents within 60 seconds Flag + alert
Unusual resource type pattern New account creating only egress-heavy resources (VMs with max outbound, etc.) Flag + alert
Failed API auth attempts >20 auth failures in 10 minutes Lockout + alert
Project/org creation flood >3 organizations created by one user in 24 hours Hold + alert

Tools: Datum platform audit logs (per the IAM Authentication enhancement, audit logging is a planned feature), or application-level request logging piped to a log aggregator (Grafana Loki, Datadog, or even a daily CSV export from the API layer if no aggregator is in place yet).

Daily Batch Review

Every business day, one team member runs the daily triage. This should take under 30 minutes at early-stage scale. The review covers all accounts that:

  • Were created in the last 24 hours with a medium risk score (3-5)
  • Were created in the last 7 days and have not yet sent any API traffic (zombie accounts — created but never used, may be pre-positioning)
  • Had their welcome email soft-bounce or not opened in 48 hours
  • Triggered any real-time alert that was not already actioned

For each account in the batch, the reviewer looks at:

  1. What resources (if any) has the account created?
  2. What API calls has the account made?
  3. Does the signup email correspond to a real person or company that can be found externally? (LinkedIn search, company website lookup — 2-minute check)
  4. Has the account interacted with support?

Decision outcomes for each account in the daily batch:

  • Clear: Risk score was a false positive. Mark as reviewed-clean. No action.
  • Watch: Something is slightly off but not conclusive. Set a 7-day watch flag. Review again next week.
  • Challenge: Send a manual email asking the user to confirm their use case ("We noticed your account was created recently and wanted to learn more about how you plan to use Datum Cloud..."). This is also a sales motion. Give 48 hours to respond.
  • Suspend: Confirmed or near-certain abuse. Suspend immediately, follow offboarding process.

Log every decision with: reviewer name, timestamp, account ID, evidence summary, and outcome.

7-Day and 30-Day Cohort Reviews

In addition to the daily batch, run a cohort review at:

  • 7 days: Check all accounts created 7 days ago that remain in "watch" status.
  • 30 days: Check all accounts that have never created any resource. Accounts that are 30 days old and completely dormant are candidates for proactive outreach ("Can we help you get started?") or eventual cleanup.

Stage 3: Escalation Paths

Escalation Level 1: Suspicious Activity (Internal Review)

Trigger: An account scores medium risk at signup AND exhibits at least one behavioral signal in the first 24 hours, OR a daily batch reviewer is uncertain and wants a second opinion.

Actions:

  1. One reviewer flags the account in the ops tracking sheet with status "escalated-L1".
  2. A second team member independently reviews the same signals within 4 business hours.
  3. If both reviewers agree the account is suspicious, move to Escalation Level 2.
  4. If reviewers disagree, the tie is broken by the on-call ops lead.
  5. Document the decision regardless of outcome.

Escalation Level 2: Probable Abuse (Controlled Restriction)

Trigger: Two reviewers agree on probable abuse, OR a single high-confidence signal (Tor + datacenter IP + high API rate all simultaneously).

Actions:

  1. Apply a resource creation hold if not already in place.
  2. Send a "verify your account" email to the signup address. This serves two purposes: it gives a legitimate user a path back, and it generates evidence of good-faith contact before any further action.
  3. Set a 24-hour timer. If no response, escalate to Level 3.
  4. If the user responds with a plausible explanation (e.g., "I'm a developer testing from a VPN"), review the explanation and either clear or escalate.
  5. Log everything.

Escalation Level 3: Confirmed Abuse (Suspension and Offboarding)

Trigger: Level 2 timer expired with no response, OR direct observation of harmful activity (credential stuffing traffic exiting from Datum IPs, spam sent from Datum compute, etc.).

Actions:

  1. Suspend the account. In the Datum IAM model, this means removing all IAM policy bindings that grant resource access.
  2. Preserve all audit logs for the account. Do not delete any records. Export to a durable, access-controlled location.
  3. Deprovision all resources created by the account (delete VMs, network endpoints, DNS zones, etc.).
  4. If the abuse involved traffic that harmed third parties (spam, DDoS), notify your upstream provider (if applicable) and the relevant abuse reporting channels (abuse@).
  5. Add the signup email domain and IP range to the pre-signup block list if not already present.
  6. Send the account a suspension notification email with a contact address for appeals. This is required in many jurisdictions and is good practice regardless.
  7. Log the full action trail.

Escalation Level 4: Legal or Regulatory Action

Trigger: The abuse involves potential criminal activity (account takeover, ransomware distribution, CSAM, or similar), a formal legal request (subpoena, court order) is received, or the ops team believes law enforcement referral is warranted.

Actions:

  1. Immediately loop in legal counsel. Do not take further technical action until legal provides direction — preservation orders may prohibit deletion.
  2. Preserve all data in its current state. Snapshot relevant infrastructure.
  3. Do not communicate with the account holder.
  4. Respond to any external parties (law enforcement, other abuse contacts) only through legal counsel.
  5. Designate a single named contact for all external communications on this matter.

Stage 4: Offboarding Procedure

When a confirmed-abusive account is offboarded (Escalation Level 3 or higher), execute in this order:

  1. Snapshot audit trail: Export all API audit logs, resource creation records, and reviewer notes for the account. Store in a write-once location (S3 Object Lock, or a Google Drive folder with edit history locked).

  2. Revoke all active sessions: Force-expire all JWT tokens and OAuth sessions for the user. In the Datum IAM model this is accomplished via the JWT revocation mechanism described in the IAM Authentication enhancement.

  3. Remove IAM policy bindings: Remove all role bindings granting the user permissions at the Organization and Project level.

  4. Deprovision resources: Enumerate and delete all resources owned by the user's organizations and projects. Follow the resource deletion order that avoids dependency conflicts (child resources before parent resources, per the Datum resource hierarchy).

  5. Tombstone the account: Mark the user account as suspended in the user management system. Do not hard-delete the account record — retain it with a suspended flag so the email address cannot be reused to re-register.

  6. Update block lists: Add the email address, email domain (if domain-wide abuse is confirmed), and originating IP range to the pre-signup block list.

  7. Send suspension notice: Send a final email to the signup address documenting the suspension and providing an appeals path.

  8. Post-incident note: Add a brief post-incident entry to the ops log describing: what was observed, what actions were taken, and what (if anything) should change in the workflow to catch this pattern earlier.


Stage 5: Metrics and Workflow Health

Track these metrics weekly. If the workflow is healthy, they should stabilize over time.

Metric Target Warning Threshold
Signup volume (daily) Baseline + track 3x week-over-week spike = alert
High-risk signups as % of total <5% >15% = review gate calibration
False-positive rate (cleared accounts that were flagged) <20% of flagged >40% = gates are too aggressive
Time to suspension for confirmed abuse <4h business hours >8h = staffing gap
Daily triage time <30 min/day >60 min/day = automation needed
Bounce rate on welcome email <2% >10% = gate calibration needed
Accounts suspended per week Track Sudden spike = campaign underway

Tooling Summary

Stage Purpose Recommended Tool Cost
Pre-signup Disposable email detection disposable-email-domains (OSS) Free
Pre-signup IP reputation AbuseIPDB Free tier (1k/day)
Pre-signup IP geolocation + datacenter detection ip-api.com or MaxMind GeoLite2 Free
Pre-signup Tor exit node check dan.me.uk/torlist Free
Pre-signup WHOIS / domain age whois.iana.org or domaintools (paid for volume) Free at low volume
Pre-signup Rate limiting Cloudflare Rules or app-layer token bucket Free tier available
Signup event Email deliverability Resend / Postmark (bounce webhooks) Low cost
Post-signup Audit log aggregation Grafana Loki, Datadog, or even a simple structured log Varies
Daily triage Case tracking Notion, Linear, or a shared Google Sheet Free
Escalation Evidence preservation S3 Object Lock or equivalent write-once store Low cost
Escalation Internal alerting Slack with a dedicated #signup-alerts channel Free

Flowchart

flowchart TD
    A([User visits signup page]) --> B[Gate 0-A: Email domain check]

    B --> B1{Disposable domain?}
    B1 -- Yes --> BLOCK1[Hard block<br/>Show error message]
    B1 -- No --> B2{No MX record?}
    B2 -- Yes --> HOLD1[Require manual approval<br/>Before access granted]
    B2 -- No --> C[Gate 0-B: IP reputation check]

    C --> C1{Tor exit node?}
    C1 -- Yes --> C2[Allow but require<br/>email verification<br/>Log heavily]
    C1 -- No --> C3{Datacenter IP<br/>or blocklist?}
    C3 -- Yes --> C4[Allow signup<br/>Flag high-risk<br/>For post-signup review]
    C3 -- No --> D[Gate 0-C: Rate limit check]

    D --> D1{Rate limit<br/>exceeded?}
    D1 -- Yes --> BLOCK2[Reject submission<br/>Return retry-after]
    D1 -- No --> E[Gate 0-D: IdP signal collection]

    E --> E1[Collect IdP account age<br/>activity signals]
    E1 --> F[Account created<br/>Stage 1: Signup event handling]

    F --> G[Compute risk score<br/>from all gate signals]
    G --> G1{Risk score}

    G1 -- 0-2<br/>Low --> H1[Add to daily<br/>review batch]
    G1 -- 3-5<br/>Medium --> H2[Send welcome email<br/>with verification prompt<br/>Review within 24h]
    G1 -- 6+<br/>High --> H3[Apply resource<br/>creation hold<br/>Assign same-day review]

    H1 --> I[Stage 2: Post-signup monitoring]
    H2 --> I
    H3 --> I

    I --> J[Real-time behavioral<br/>monitoring active]
    J --> J1{Alert<br/>triggered?}
    J1 -- Yes --> K[Immediate alert<br/>to ops channel]
    K --> L[On-call reviews<br/>within 1 hour]
    J1 -- No --> M[Daily triage<br/>batch review]

    M --> M1{Daily review<br/>outcome}
    M1 -- Clear --> M2[Mark reviewed-clean<br/>No action]
    M1 -- Watch --> M3[Set 7-day<br/>watch flag]
    M1 -- Challenge --> M4[Send manual<br/>outreach email]
    M1 -- Suspend --> N

    L --> L1{Reviewer<br/>decision}
    L1 -- Clear --> M2
    L1 -- Escalate --> ESC1

    M4 --> M4a{User<br/>responds?}
    M4a -- Yes, plausible --> M2
    M4a -- No response in 48h --> ESC1
    M4a -- Implausible --> ESC1

    M3 --> M3a{7-day<br/>recheck}
    M3a -- Still suspicious --> ESC1
    M3a -- Cleared --> M2

    ESC1([Escalation L1<br/>Internal review<br/>Two-reviewer check]) --> ESC1a{Both reviewers<br/>agree?}
    ESC1a -- No --> ESC1b[Ops lead<br/>breaks tie]
    ESC1a -- Yes, suspicious --> ESC2
    ESC1b --> ESC2

    ESC2([Escalation L2<br/>Probable abuse<br/>Controlled restriction]) --> ESC2a[Apply resource hold<br/>Send verification email<br/>Start 24h timer]
    ESC2a --> ESC2b{User responds<br/>within 24h?}
    ESC2b -- Plausible --> M2
    ESC2b -- No response<br/>or implausible --> ESC3

    ESC3([Escalation L3<br/>Confirmed abuse<br/>Suspension]) --> N

    N([Offboarding]) --> N1[Snapshot audit trail<br/>to write-once store]
    N1 --> N2[Revoke all sessions<br/>and JWT tokens]
    N2 --> N3[Remove all IAM<br/>policy bindings]
    N3 --> N4[Deprovision all<br/>user resources]
    N4 --> N5[Tombstone account<br/>Prevent re-registration]
    N5 --> N6[Update pre-signup<br/>block lists]
    N6 --> N7[Send suspension<br/>notice with appeals path]
    N7 --> N8[Write post-incident<br/>note to ops log]

    N8 --> O{Criminal activity<br/>or legal request?}
    O -- Yes --> ESC4([Escalation L4<br/>Legal counsel<br/>Law enforcement])
    O -- No --> END([Done])

    style BLOCK1 fill:#c0392b,color:#fff
    style BLOCK2 fill:#c0392b,color:#fff
    style HOLD1 fill:#e67e22,color:#fff
    style H3 fill:#e67e22,color:#fff
    style ESC1 fill:#f39c12,color:#fff
    style ESC2 fill:#e67e22,color:#fff
    style ESC3 fill:#c0392b,color:#fff
    style ESC4 fill:#8e44ad,color:#fff
    style N fill:#c0392b,color:#fff
    style M2 fill:#27ae60,color:#fff
    style END fill:#27ae60,color:#fff
Loading

Key Decision Points

The following table consolidates every decision point in the workflow along with the criteria used at each one. This is intended as a reference card for team members executing the process.

Decision Point Location Criteria Outcomes
Disposable email? Gate 0-A Domain appears in disposable-email-domains list Hard block
No MX record? Gate 0-A DNS MX query returns empty Hold for manual approval
IP is Tor exit? Gate 0-B IP appears in Tor exit node list Allow with email verification + heavy logging
IP is datacenter/blocklist? Gate 0-B IP falls in known hosting range or AbuseIPDB score >50 Allow but flag high-risk
Rate limit exceeded? Gate 0-C >3 attempts/hr per IP, or >5/hr per domain Reject, return retry-after header
IdP account brand-new? Gate 0-D Account age <7 days AND zero activity Flag high-risk
Risk score threshold Stage 1 Additive score from all gate signals 0-2 low, 3-5 medium, 6+ high
Welcome email bounced? Stage 1-B Hard bounce from email provider webhook Immediate suspend
Real-time alert threshold Stage 2 Any signal in behavioral signal table exceeds threshold Immediate ops alert
Daily triage outcome Stage 2 Reviewer assessment of resources, API usage, identity Clear / Watch / Challenge / Suspend
Challenge response Stage 2 User replies within 48h with plausible explanation Clear or escalate
7-day watch recheck Stage 2 Still no organic activity or explanation after 7 days Escalate L1 or clear
Two-reviewer agreement Escalation L1 Both independent reviewers assess account as abusive Escalate to L2
Verification response Escalation L2 Account holder responds to verification email within 24h Clear or escalate to L3
Criminal/legal trigger Offboarding Activity involves third-party harm, law enforcement, or legal request Escalate to L4

Implementation Plan

This is ordered by implementation priority. Steps 1-4 can be completed without any platform code changes.

  1. Stand up the ops tracking sheet. Create a shared spreadsheet (or Linear project) with columns matching the risk event schema in Step 1-A. This is the single source of truth for all reviewed accounts.

  2. Configure email provider bounce webhooks. Wire hard and soft bounce events from the email provider (Resend, Postmark, etc.) to the ops channel. Test with a known-bad address.

  3. Implement disposable email domain check at the signup form. This is a client-side or server-side validation against the open-source domain list. Deploy first in warn-only mode, observe false-positive rate for one week, then switch to block mode.

  4. Implement IP reputation lookup at signup. Call ip-api.com or AbuseIPDB at form submission. Log the result as a field on the user record. Do not block yet — accumulate data for one week to calibrate thresholds.

  5. Implement signup rate limiting at the edge or application layer. Start with the thresholds in Gate 0-C. Monitor for false positives (legitimate company onboarding hitting the domain rate limit is the most likely one).

  6. Implement the risk score computation and the signup event emission to the ops channel. Initially this can be a simple script or webhook triggered on account creation.

  7. Define the resource creation hold mechanism in the Datum IAM layer. In the current model, this is an IAM policy that withholds create permissions until a flag is cleared. Define the flag format and the manual workflow for clearing it.

  8. Establish the daily triage calendar block. Assign a rotating on-call reviewer. Run the triage for two weeks and note: how long it takes, how many accounts are reviewed, how many are false positives. Use this to tune the risk score thresholds.

  9. Add IdP-level signal collection (GitHub account age and activity via GitHub API). Integrate into the risk score. GitHub is simpler because the API is public; deprioritize Google until it proves necessary.

  10. Write the escalation runbooks as short Notion pages or internal wiki entries (one page per escalation level). Link them from the ops tracking sheet.

  11. Review and tune thresholds at the 30-day mark based on observed false-positive and false-negative rates.


Handoff

Decisions Made

  • No CAPTCHA is included in this design. CAPTCHA is increasingly defeatable by automated solvers and creates friction for legitimate users. The combination of email domain filtering, IP reputation, IdP signal collection, and rate limiting provides equivalent protection with less user impact.

  • Credit card collection is explicitly not being added as a gate. The request was to design the best manual process within the current constraints (no CC, no identity verification). Adding payment details is a separate product decision.

  • The resource creation hold is the primary abuse-mitigation control for high-risk accounts. Without a payment gate, this is the most effective lever: it decouples "can explore the platform" from "can consume resources we pay for."

  • Hard deletion of abusive accounts is explicitly avoided. Tombstoning (suspend without delete) prevents email address reuse and preserves the audit record. Legal counsel should be consulted before any hard deletion.

  • Risk scores are intentionally simple (additive integer). At small team scale, a complex ML scoring model is not maintainable. The additive model is auditable, explainable, and can be tuned by adjusting individual signal weights.

Open Questions

  • What is the current mechanism for applying IAM policy holds on account creation? The design assumes this can be done via IAM policy bindings, but the exact field/flag needs confirmation from the platform team. (Blocks Step 7 of the implementation plan.)

  • Is there a current audit log aggregation solution? The post-signup behavioral monitoring depends on structured API audit logs being queryable. If no aggregator is in place, the real-time alerting thresholds cannot be implemented until one is chosen. (Non-blocking for pre-signup gates and daily triage.)

  • What is the designated appeals email address for suspended accounts? The offboarding step requires a published contact for appeals. Legal/policy should confirm before the suspension notice template is written. (Non-blocking for technical implementation.)

  • Does Datum have an existing relationship with a WHOIS or domain intelligence API vendor for higher-volume lookups? At low signup volume, free WHOIS lookups are sufficient. Above approximately 500 signups/day, a paid API will be needed to avoid rate limiting. (Non-blocking today.)

Implementation Notes

  • For the ops team: the daily triage is the most important habit to establish. Everything else in this workflow is in service of making that triage fast and accurate. Start with the triage process before investing in automation.

  • For the platform team: the resource creation hold mechanism is the highest- priority platform change requested. It should be implementable as an IAM policy condition without schema changes, but confirm with the IAM team.

  • For legal/compliance: the offboarding procedure preserves all records before action. If a legal hold notice arrives for an account that is already suspended, confirm that the tombstone record and audit export satisfy the hold requirements.

  • For the security team: the IdP-level signals (especially GitHub account age) are the most novel and highest-signal additions in this workflow. They are worth prioritizing in the implementation plan. A zero-activity, week-old GitHub account signing up for infrastructure resources is a strong indicator of abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment