Skip to content

Instantly share code, notes, and snippets.

@masayag
Last active February 24, 2026 15:43
Show Gist options
  • Select an option

  • Save masayag/13cd6381cc2c93004471e1c0678e392a to your computer and use it in GitHub Desktop.

Select an option

Save masayag/13cd6381cc2c93004471e1c0678e392a to your computer and use it in GitHub Desktop.
MCO/Thanos Cost Data Replacement Plan

MCO/Thanos Cost Data Replacement Plan

Plan for replacing CMMO + Ingress with ACM MCO/Thanos for cost-onprem-chart (OCP cost/usage data). Use this document to agree on architecture, split into work items, and assign research per option to different agents.


Architecture Decision Record (ADR-0001)

Status: Planning Date: 2025-02-15 Deciders: Cost/Insights on-premise architecture

Context

Cost-onprem is increasingly deployed on top of ACM (Advanced Cluster Management / Stolostron), which already runs MCO (multicluster-observability-operator) and Thanos to collect metrics from all managed clusters. Running CMMO on every cluster duplicates collection and adds operational overhead for those managed by ACM.

Design objectives (to be confirmed with product):

  • When cost-onprem-chart is installed on top of ACM, the deployment keeps the same overall architecture as today, including insights-ingress. Only the data path for ACM-managed clusters changes: those clusters use MCO/Thanos and the Thanos Bridge instead of CMMO.
  • For other clusters that must report to cost-management (e.g. the hub cluster, or any cluster not managed by ACM), CMMO remains the tool; they continue to report via CMMO → insights-ingress.
  • Remove CMMO and insights-ingress from the cost/ROS data path only for clusters that report their metrics to Thanos (whether via MCO or any other mechanism). The option to use CMMO and insights-ingress remains available for clusters that are not managed by ACM, or for the ACM hub itself.
  • Keep koku's existing consumer and OCP processing pipeline unchanged (no breaking contract).
  • Resolve org_id without JWT for the Thanos path (MCO has no user identity).

If product decides otherwise: we may need to ensure that in ACM all clusters (including the hub) report their metrics via MCO/Thanos with the same data as managed clusters, so that CMMO and insights-ingress could be removed from the cost path entirely. That would require hub (and any non-managed cluster) to be part of the observability stack and would be a different scope.

Decision

The Thanos-based data-ingestion path is added in addition to the existing CMMO-equivalent path. It does not replace it. Clusters that are not managed by ACM continue to report metrics via CMMO and insights-ingress; only clusters reporting to Thanos (when the feature is enabled) use the Thanos Bridge path. Both paths feed the same Kafka topic and koku consumer.

We will:

  1. Extend MCO so Thanos receives the same metrics CMMO collects (pod, node, namespace, storage, VM, ROS), via allowlist and rules in multicluster-observability-operator, with a stable cluster label for filtering.
  2. Add a "Thanos Bridge" inside masu (same codebase/repo): a scheduled or job that:
    • Lists OCP sources from the koku DB (cluster_id, org_id).
    • For each cluster and a configurable time window: queries Thanos (PromQL), transforms Prometheus time series into CMMO-equivalent CSVs, builds manifest.json and tar.gz, uploads to S3, and publishes to platform.upload.announce with Kafka header service: "hccm" and the same JSON body as ingress (url, request_id, org_id, b64_identity).
    • Uses org_id only from koku (Sources table); no MCO changes for auth.
  3. Keep the koku contract unchanged: same topic, same message shape, same payload format (tar.gz + manifest.json + CSVs). The existing Kafka message handler and OCP processors are not modified.
  4. Idempotency: Use a deterministic manifest UUID per (cluster_id, time_window) (e.g. UUID5); koku's get_or_create and record_report_status make re-runs idempotent. Optional bridge-side "last processed" cursor per cluster for efficiency.
  5. Feature toggles (cost-onprem-chart): Add ingress.enabled and thanosBridge.enabled. Enable thanosBridge.enabled for Thanos-reporting clusters (do not install CMMO on those). Keep ingress.enabled true when the customer has non-Thanos clusters that must report via the existing CMMO → insights-ingress path; it may be false when all clusters report to Thanos.

Rejected alternatives:

  • Thin proxy in front of Thanos: Extra service and network hop; org_id still must come from koku. Deferred unless operational boundaries require a separate service.
  • MCO writes report blobs to S3: Dual-write and more MCO/collector changes; two formats to maintain. Not chosen for first phase.

Consequences

Positive:

  • Single source of metrics (Thanos) for cost and ROS for Thanos-reporting clusters.
  • For those clusters, CMMO and insights-ingress are not in the path; for the hub and other non-Thanos clusters, CMMO → insights-ingress remains; overall architecture (including insights-ingress) is unchanged.
  • No changes to koku's consumer or OCP pipeline; same S3/Kafka contract.
  • Tenant (org_id) is explicit and auditable from koku DB.

Negative / Risks:

  • Bridge must mirror CMMO semantics (PromQL → CSV transform); metric names/labels in Thanos must align with CMMO.
  • Cluster identity mapping (Thanos cluster label ↔ koku cluster_id) must be defined and implemented; optional managed_cluster_name mapping if ACM uses different identifiers.

Follow-up:

  • MCO: extend allowlist and rules; confirm cluster label.
  • multicluster-observability-addon: verify it does not drop new metrics (no code change expected).
  • Implementation: bridge module, scheduling (cron/Celery), cluster identity mapping, cost-onprem-chart toggles, tests, and docs.

Component Diagram

The target state has two coexisting data paths that converge at the same Kafka topic and koku consumer. The CMMO → insights-ingress path serves clusters not reporting to Thanos (e.g. the ACM hub, non-ACM clusters). The Thanos Bridge path serves clusters whose metrics are already in Thanos.

flowchart TB
    subgraph Non-Thanos Clusters
        CMMO[CMMO]
    end

    subgraph ACM-Managed Clusters
        Prom[Prometheus]
        MCO[MCO metrics-collector]
    end

    subgraph Hub / Cost-onprem
        Ingress[Insights Ingress]
        Thanos[Thanos Query]
        Bridge[Thanos Bridge<br/>masu scheduled job]
        DB[(Koku DB<br/>Sources, Providers)]
        S3[(S3 staging)]
        Kafka[[Kafka<br/>platform.upload.announce]]
        Handler[Kafka Message Handler<br/>masu listener]
        Processor[Masu OCP Processor]
    end

    %% Path 1: CMMO
    CMMO -->|Upload tar.gz<br/>with JWT| Ingress
    Ingress -->|Stage payload| S3
    Ingress -->|Publish msg<br/>service: hccm| Kafka

    %% Path 2: Thanos Bridge
    Prom -->|Scrape| MCO
    MCO -->|Remote write| Thanos
    Bridge -->|List OCP sources<br/>cluster_id, org_id| DB
    Bridge -->|PromQL queries| Thanos
    Bridge -->|Upload tar.gz| S3
    Bridge -->|Publish msg<br/>service: hccm| Kafka

    %% Shared downstream
    Kafka --> Handler
    Handler -->|Download payload| S3
    Handler -->|Resolve tenant| DB
    Handler --> Processor
Loading

Sequence Diagram

Both data paths are shown below. Path 1 is the existing CMMO flow for clusters not reporting to Thanos. Path 2 is the new Thanos Bridge flow for Thanos-reporting clusters. Both produce the same Kafka message and payload format; the downstream handler and processor are unchanged.

sequenceDiagram
    box Non-Thanos Cluster
        participant CMMO as CMMO
    end
    box ACM-Managed Cluster
        participant MCO as MCO metrics-collector
    end
    box Hub / Cost-onprem
        participant Ingress as Insights Ingress
        participant Thanos as Thanos Query
        participant Bridge as Thanos Bridge (masu)
        participant DB as Koku DB
        participant S3 as S3 staging
        participant Kafka as Kafka
        participant Handler as Kafka Msg Handler
        participant Processor as Masu Processor
    end

    Note over CMMO,Ingress: Path 1 — CMMO (hub, non-ACM clusters)
    CMMO->>CMMO: Collect Prometheus metrics, build CSVs + manifest, package tar.gz
    CMMO->>Ingress: Upload tar.gz (JWT with org_id)
    Ingress->>S3: Stage payload
    S3-->>Ingress: url
    Ingress->>Kafka: Publish (service: hccm, org_id, url, request_id, b64_identity)

    Note over MCO,Bridge: Path 2 — Thanos Bridge (ACM-managed clusters)
    MCO->>Thanos: Remote write metrics
    Note over Bridge: Scheduled
    Bridge->>DB: List OCP sources (cluster_id, org_id)
    loop Per Thanos-reporting cluster
        Bridge->>Thanos: PromQL (pod, node, storage, VM, ROS)
        Thanos-->>Bridge: Time series
        Bridge->>Bridge: Transform to CMMO-equivalent CSVs + manifest.json
        Bridge->>Bridge: Package tar.gz
        Bridge->>S3: Upload tar.gz
        S3-->>Bridge: url
        Bridge->>Kafka: Publish (service: hccm, org_id, url, request_id, b64_identity="")
    end

    Note over Kafka,Processor: Shared downstream — unchanged
    Kafka->>Handler: Consume message
    Handler->>S3: Download payload
    Handler->>Handler: extract_payload, parse manifest
    Handler->>DB: get_source_and_provider_from_cluster_id(cluster_id, org_id)
    Handler->>Processor: process_report (unchanged)
Loading

1. Data-Flow Description (Current → Koku)

1.1 Current flow (CMMO → Koku)

  1. CMMO on each OCP cluster

    • Queries Prometheus (e.g. every 15 min).
    • Collects pod CPU/memory, storage (PVC), node/namespace labels, etc.
    • Periodically (e.g. every 6 h) builds a tar.gz with manifest.json + CSVs (pod_usage, storage_usage, node_labels, namespace_labels, etc.).
    • Authenticates with JWT (x-rh-identity) that carries org_id (and optionally account).
    • POSTs the tar.gz to the ingress URL (content-type e.g. application/vnd.redhat.hccm.*+tgz).
  2. Ingress (insights-ingress-go)

    • Validates content-type and service (e.g. hccm).
    • Reads org_id (and account) from the JWT (identity.GetIdentity).
    • Stores the file in S3-compatible storage (keyed by request_id, org, etc.).
    • Generates a presigned URL to that object.
    • Publishes to Kafka topic platform.upload.announce a message containing: request_id, account, org_id, url, service (e.g. hccm), plus size, timestamp, etc.
    • Response to client: 202 Accepted with request_id and upload metadata.
  3. Koku listener

    • Consumes platform.upload.announce, filters by header service == "hccm".
    • For each message: reads org_id from the payload; downloads the tar.gz from the presigned URL; extracts manifest.json and CSV files.
  4. Koku processing (masu)

    • Parses manifest: cluster_id, uuid, date, files[].
    • Resolves provider: get_source_and_provider_from_cluster_id(cluster_id, org_id)Sources + Provider (cluster must be registered in Sources for that org_id).
    • If no source: logs and skips (no tenant).
    • Creates/updates CostUsageReportManifest, copies reports to local dir, create_daily_archives (splits CSVs by day, validates/sanitizes, uploads daily CSVs to S3).
    • Line-item processing per report file (OCP report processor), then summarization (Celery).
    • Sends validation (success/failure) to platform.upload.validation.

Important steps after "ingress stores and announces": Kafka → download by URL → extract manifest + CSVs → resolve cluster_id + org_id to Provider → daily archives to S3 → report processing → summarization → validation.


2. Replacing CMMO + Ingress with MCO/Thanos + New Component

2.1 Rationale and target architecture

  • cost-onprem-chart is deployed on ACM (stolostron).
  • ACM already has MCO (multicluster-observability-operator) and Thanos: metrics from all managed clusters are collected (Metrics Collector or MCOA) and written to Thanos (Observatorium API → Thanos Receive → object storage).
  • Goal: Remove CMMO and ingress from the cost/ROS data path only for clusters that report their metrics to Thanos (whether via MCO or any other mechanism). Use a new pipeline for those clusters: MCO/Thanos → Thanos Bridge → same contract as today (S3 + Kafka) so Koku stays unchanged. The existing CMMO → insights-ingress path remains available for clusters not managed by ACM and for the ACM hub itself.

2.2 What must be true for the new pipeline

  • Metrics in Thanos are equivalent in content to what CMMO produces (same logical series/labels so you can reconstruct pod/node/namespace usage and labels).
  • A new component runs in the cost-onprem/masu environment; it:
    • Reads from Thanos (e.g. Thanos Query),
    • Transforms that data into the same CSV + manifest format Koku expects,
    • Writes tar.gz (or equivalent) to the same S3 bucket (or same path scheme) that Koku uses,
    • Publishes a Kafka message in the same shape as ingress (platform.upload.announce, service: hccm, request_id, account, org_id, url).
  • Koku keeps consuming platform.upload.announce (service hccm), downloading by URL, and running existing extraction + provider resolution + processing. No change to the "after Kafka" data flow.

2.3 Architectural design options

Option A – Thanos → "Thanos-to-Koku" service (recommended baseline)

  • New service (e.g. under masu): scheduled (e.g. every 6 h).
  • Input: Thanos Query API (PromQL) or Store API; time range and cluster_id (from Thanos external labels).
  • Logic: For each (cluster_id, interval): query Thanos for the metrics that correspond to CMMO's pod_usage, storage_usage, node/namespace labels; aggregate/transform into the exact CSV column set Koku uses; build manifest.json + CSVs; pack tar.gz; upload to S3; produce Kafka message with presigned URL and org_id (see 2.5).
  • Deploy: Single instance (or one per tenant/scheduler) in the hub/cost-onprem namespace; no CMMO on managed clusters; no ingress for hccm.

Option B – MCO writes "cost/usage" export to object storage

  • Extend MCO (or a sidecar/controller) to periodically export from Thanos (or from the collector) into a defined format (e.g. same CSV schema) in object storage (e.g. same bucket as Thanos or a dedicated bucket).
  • New component only: discovers new export files (e.g. by listing or events), moves/copies them to the path Koku expects, generates manifest.json, and publishes Kafka with URL + org_id.
  • Pro: Less "query and transform" in the new component. Con: Requires MCO changes and a stable export schema and discovery story.

Option C – Hybrid: MCO addon exposes cost metrics; component queries Thanos

  • multicluster-observability-addon (MCOA) (or legacy metrics collector) is extended so that the same metrics CMMO would collect (pod CPU/memory, PVC, node/namespace labels) are included in the allowlist and sent to Thanos (with cluster identity in labels).
  • New component is as in Option A: query Thanos → transform → CSV + manifest → S3 + Kafka.
  • This minimizes MCO changes (only allowlist + possibly recording rules) and keeps transformation in one place.

Recommendation: Option A + C — extend MCO/MCOA so Thanos has "Koku/ROS" metrics; add a single Thanos-to-Koku service that queries Thanos, produces CSV+manifest, writes S3, and publishes Kafka.

2.4 MCO / MCOA extensions

  • Metrics:
    • Add to MCO's metrics allowlist (or MCOA scrape config) every metric (and label) needed to reconstruct:
      • Pod usage: CPU/memory request, limit, usage (e.g. container_cpu_*, container_memory_*, pod/node/namespace labels).
      • Storage: PVC/PV usage and capacity.
      • Node/namespace labels for cost and ROS.
    • Map these to the same semantics as CMMO's CSV columns (see koku docs/architecture/csv-processing-ocp.md and cost-onprem data-sources doc).
  • Thanos:
    • Ensure each series is tagged with a cluster identifier (e.g. cluster or managed_cluster from MCO) so the new component can slice by cluster and time.
  • multicluster-observability-addon:
    • If using MCOA: add the same metrics to the ScrapeConfig (or equivalent) used for user workload / cost; ensure cluster_id is in labels.
    • No change to Thanos Receive/Store API; only to what is scraped and forwarded.

2.5 Org_id / tenant mapping (critical)

  • Today: org_id comes from the JWT the CMMO sends to ingress. Ingress puts it in the Kafka message; Koku uses (cluster_id, org_id) to find Sources and Provider (get_source_and_provider_from_cluster_id(cluster_id, org_id)).
  • With MCO: There is no JWT; metrics are pushed by the addon/collector with cluster identity, not org_id.

Design for org_id:

  1. Cluster → tenant in Koku DB

    • In cost-onprem, Sources (and Provider) already bind cluster_id to org_id (tenant).
    • So the authoritative mapping is: cluster_id → org_id from Koku's Sources (and Provider) tables.
  2. New component must have org_id before publishing Kafka

    • The component runs in the masu/Koku environment and has DB access.
    • For each cluster_id it wants to emit a report for:
      • Query Koku DB: get Sources (or Provider) where provider.authentication.credentials.cluster_id == cluster_id and take that source's org_id (and optionally account).
      • If no source: skip that cluster (or log and retry later); do not publish a message without org_id (Koku would drop it anyway).
    • Publish Kafka with that org_id (and account if present) so the message is identical to what ingress would send.
  3. Ordering / bootstrap

    • Providers (Sources) must be created before the new component can report for that cluster (e.g. via UI/API when the user "adds" the cluster to Cost Management).
    • So: cluster registered in Sources with org_id → component can resolve cluster_id → org_id → publish message. No change to Koku's "unknown organization" or "unexpected OCP report" handling.
  4. Optional: ACM as source of org_id

    • If ManagedCluster (or a custom resource) ever carries an "org_id" or "tenant" annotation, the component could use that as a hint or fallback, but Koku's source of truth should remain Sources/Provider so that behavior stays consistent with today.

2.6 Placement of the new component (masu)

  • Location: Implement as a service under masu (same codebase/deployment boundary): same DB, same config, same tenant model.
  • Execution:
    • Scheduled: e.g. Celery Beat job every N hours (e.g. 6), or a dedicated cron-like pod that runs "Thanos → transform → S3 → Kafka" for a time window.
  • Future: Per-tenant schedule (e.g. different intervals per org_id) can be added later via config or DB.

2.7 Summary of changes

Area Change
MCO / MCOA Extend metrics allowlist (and scrape config) so Thanos has Koku/ROS metrics with cluster labels.
New component (masu) Thanos Query → transform to CMMO-like CSV + manifest → upload tar.gz to S3 → publish platform.upload.announce (hccm) with org_id from Koku DB (cluster_id → Sources → org_id).
Koku No change to Kafka consumer or processing; still requires org_id in message and registered Source per cluster.
Ingress Still deployed; not used for Thanos-reporting clusters but remains for hub and non-ACM clusters using CMMO.
CMMO Removed only from clusters reporting to Thanos; remains for hub and non-ACM clusters.

3. Resource Plan and Savings

3.1 Objectives

  • Size the new "Thanos-to-Koku" component (CPU/memory) for different cluster and tenant counts.
  • Compare aggregate cost of "one MCO + one new component" vs "CMMO on every cluster."

3.2 Sizing the new component (Thanos-to-Koku)

  • Work per run: For each (cluster, time_window): query Thanos → build CSVs + manifest → compress → upload S3 → one Kafka message per payload.
  • Drivers: Number of clusters, pods/nodes per cluster, time window (e.g. 6 h), report types (pod, storage, labels).

Proposed sizing matrix (requests/limits):

Clusters Tenants (org_id) Pods (total) CPU request Memory request Notes
5 1–2 ~500 200m 512 Mi Dev/small
10 2–5 ~2k 500m 1 Gi Small prod
25 5–10 ~5k 1 2 Gi Medium
50 10–20 ~10k 2 4 Gi Large
100+ 20+ ~20k+ 4 8 Gi Scale test / split jobs
  • Execution: Run as scheduled job (CronJob or Celery) so average utilization is low; peak during the run.
  • Parallelism: For large cluster counts, process clusters in batches (e.g. 10 at a time) to cap memory and avoid overloading Thanos.

3.3 Saving: CMMO on every cluster vs one MCO

  • Current (CMMO): Per-cluster: CMMO pod(s) (CPU/memory) + Prometheus queries + network to ingress. Ingress: 1 deployment for all clusters.
  • New (MCO + component): No CMMO on managed clusters; MCO already exists for ACM; new component: one deployment (or CronJob) sized as above.

Rough comparison (for planning only):

Scenario CMMO total (N clusters) New component (single) Net change
10 clusters 10 × (e.g. 100m CPU, 256 Mi) ≈ 1 core, 2.5 Gi 500m, 1 Gi Save ~0.5 core, ~1.5 Gi across clusters
50 clusters 50 × same ≈ 5 core, 12.5 Gi 2 core, 4 Gi Save ~3 core, ~8.5 Gi
100 clusters 100 × same ≈ 10 core, 25 Gi 4 core, 8 Gi Save ~6 core, ~17 Gi

(Exact CMMO numbers should be taken from CMMO's own resource requests/limits and multiplied by N.)

3.4 How to determine required resources in practice

  1. PoC: Deploy the new component against a Thanos with Koku metrics; run for 1–2 clusters, then 10, then 50; measure peak CPU/memory and run duration per cycle.
  2. Load/scale tests: Simulate many clusters (e.g. MCO simulator pattern in multicluster-observability-operator tools/simulator); vary pods per cluster and measure component and Thanos load.
  3. Document: Update resource-requirements.md (and any runbooks) with recommended requests/limits per tier and a formula or table: clusters + tenants + pods → suggested CPU/memory. Note that MCO/Thanos scale separately (see MCO docs/scale-perf.md); the new component is an additional consumer of Thanos Query.

4. Work-item and research split (for agents)

After agreeing on the architecture, split work as follows. Each bullet can be assigned to a different agent or sprint.

Option A – Thanos-to-Koku service

  • A1 – Thanos Query API: Document Thanos Query (and Store API) usage: endpoints, auth, time ranges, and label selectors (especially cluster_id). Proof-of-concept: list clusters and query one cluster's metrics for a 6h window.
  • A2 – Metrics → CSV mapping: For each CMMO report type (pod_usage, storage_usage, node_labels, namespace_labels), list Prometheus/Thanos metric names and labels; write mapping to Koku CSV columns (reference: koku/docs/architecture/csv-processing-ocp.md). Identify any gaps (e.g. recording rules).
  • A3 – Transform + pack: Design or implement: query result → DataFrame/CSV → manifest.json → tar.gz. Validate output against Koku's parse_manifest and report processor expectations.
  • A4 – S3 + Kafka producer: Same contract as ingress: upload tar.gz to S3, generate presigned URL, produce platform.upload.announce (hccm) with org_id from DB. Reuse Koku S3 path conventions and Kafka producer utils.

Option B – MCO export to object storage

  • B1 – MCO export design: Research whether MCO (or Observatorium) can export "cost/usage" data in a defined schema to object storage. Document APIs or controller changes needed.
  • B2 – Discovery and manifest: Design how the new component discovers new export files (list bucket, events, or CRs) and builds manifest.json + Kafka message with org_id.

Option C – MCOA metrics allowlist

  • C1 – Allowlist for Koku/ROS: In MCO/MCOA repos, list exact metrics (and labels) CMMO uses; add them to metrics_allowlist.yaml (legacy) or ScrapeConfig (MCOA). Verify cluster_id (or equivalent) is present on series.
  • C2 – Recording rules (optional): If raw metrics cardinality is too high, design recording rules in MCO that pre-aggregate to a "cost usage" schema and ensure they are stored in Thanos.

Cross-cutting

  • Org_id / tenant: Implement or document DB query: cluster_id → Sources → org_id (and account). Ensure component only publishes for clusters that have a Source; document bootstrap (user adds cluster before data flows).
  • Scheduling: Decide Celery Beat vs CronJob; implement scheduled job. Document per-tenant schedule as future work.
  • Resource sizing: Run PoC and scale tests; update sizing matrix and resource-requirements.md.
  • E2E test: One managed cluster, MCO sending metrics to Thanos; component runs once; Koku consumes message and processes report; verify data in DB and (if applicable) ROS.

5. References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment