Plan for replacing CMMO + Ingress with ACM MCO/Thanos for cost-onprem-chart (OCP cost/usage data). Use this document to agree on architecture, split into work items, and assign research per option to different agents.
Status: Planning Date: 2025-02-15 Deciders: Cost/Insights on-premise architecture
Cost-onprem is increasingly deployed on top of ACM (Advanced Cluster Management / Stolostron), which already runs MCO (multicluster-observability-operator) and Thanos to collect metrics from all managed clusters. Running CMMO on every cluster duplicates collection and adds operational overhead for those managed by ACM.
Design objectives (to be confirmed with product):
- When cost-onprem-chart is installed on top of ACM, the deployment keeps the same overall architecture as today, including insights-ingress. Only the data path for ACM-managed clusters changes: those clusters use MCO/Thanos and the Thanos Bridge instead of CMMO.
- For other clusters that must report to cost-management (e.g. the hub cluster, or any cluster not managed by ACM), CMMO remains the tool; they continue to report via CMMO → insights-ingress.
- Remove CMMO and insights-ingress from the cost/ROS data path only for clusters that report their metrics to Thanos (whether via MCO or any other mechanism). The option to use CMMO and insights-ingress remains available for clusters that are not managed by ACM, or for the ACM hub itself.
- Keep koku's existing consumer and OCP processing pipeline unchanged (no breaking contract).
- Resolve org_id without JWT for the Thanos path (MCO has no user identity).
If product decides otherwise: we may need to ensure that in ACM all clusters (including the hub) report their metrics via MCO/Thanos with the same data as managed clusters, so that CMMO and insights-ingress could be removed from the cost path entirely. That would require hub (and any non-managed cluster) to be part of the observability stack and would be a different scope.
The Thanos-based data-ingestion path is added in addition to the existing CMMO-equivalent path. It does not replace it. Clusters that are not managed by ACM continue to report metrics via CMMO and insights-ingress; only clusters reporting to Thanos (when the feature is enabled) use the Thanos Bridge path. Both paths feed the same Kafka topic and koku consumer.
We will:
- Extend MCO so Thanos receives the same metrics CMMO collects (pod, node, namespace, storage, VM, ROS), via allowlist and rules in multicluster-observability-operator, with a stable cluster label for filtering.
- Add a "Thanos Bridge" inside masu (same codebase/repo): a scheduled or job that:
- Lists OCP sources from the koku DB (cluster_id, org_id).
- For each cluster and a configurable time window: queries Thanos (PromQL), transforms Prometheus time series into CMMO-equivalent CSVs, builds manifest.json and tar.gz, uploads to S3, and publishes to platform.upload.announce with Kafka header service: "hccm" and the same JSON body as ingress (url, request_id, org_id, b64_identity).
- Uses org_id only from koku (Sources table); no MCO changes for auth.
- Keep the koku contract unchanged: same topic, same message shape, same payload format (tar.gz + manifest.json + CSVs). The existing Kafka message handler and OCP processors are not modified.
- Idempotency: Use a deterministic manifest UUID per (cluster_id, time_window) (e.g. UUID5); koku's
get_or_createandrecord_report_statusmake re-runs idempotent. Optional bridge-side "last processed" cursor per cluster for efficiency. - Feature toggles (cost-onprem-chart): Add ingress.enabled and thanosBridge.enabled. Enable thanosBridge.enabled for Thanos-reporting clusters (do not install CMMO on those). Keep ingress.enabled true when the customer has non-Thanos clusters that must report via the existing CMMO → insights-ingress path; it may be false when all clusters report to Thanos.
Rejected alternatives:
- Thin proxy in front of Thanos: Extra service and network hop; org_id still must come from koku. Deferred unless operational boundaries require a separate service.
- MCO writes report blobs to S3: Dual-write and more MCO/collector changes; two formats to maintain. Not chosen for first phase.
Positive:
- Single source of metrics (Thanos) for cost and ROS for Thanos-reporting clusters.
- For those clusters, CMMO and insights-ingress are not in the path; for the hub and other non-Thanos clusters, CMMO → insights-ingress remains; overall architecture (including insights-ingress) is unchanged.
- No changes to koku's consumer or OCP pipeline; same S3/Kafka contract.
- Tenant (org_id) is explicit and auditable from koku DB.
Negative / Risks:
- Bridge must mirror CMMO semantics (PromQL → CSV transform); metric names/labels in Thanos must align with CMMO.
- Cluster identity mapping (Thanos cluster label ↔ koku cluster_id) must be defined and implemented; optional managed_cluster_name mapping if ACM uses different identifiers.
Follow-up:
- MCO: extend allowlist and rules; confirm cluster label.
- multicluster-observability-addon: verify it does not drop new metrics (no code change expected).
- Implementation: bridge module, scheduling (cron/Celery), cluster identity mapping, cost-onprem-chart toggles, tests, and docs.
The target state has two coexisting data paths that converge at the same Kafka topic and koku consumer. The CMMO → insights-ingress path serves clusters not reporting to Thanos (e.g. the ACM hub, non-ACM clusters). The Thanos Bridge path serves clusters whose metrics are already in Thanos.
flowchart TB
subgraph Non-Thanos Clusters
CMMO[CMMO]
end
subgraph ACM-Managed Clusters
Prom[Prometheus]
MCO[MCO metrics-collector]
end
subgraph Hub / Cost-onprem
Ingress[Insights Ingress]
Thanos[Thanos Query]
Bridge[Thanos Bridge<br/>masu scheduled job]
DB[(Koku DB<br/>Sources, Providers)]
S3[(S3 staging)]
Kafka[[Kafka<br/>platform.upload.announce]]
Handler[Kafka Message Handler<br/>masu listener]
Processor[Masu OCP Processor]
end
%% Path 1: CMMO
CMMO -->|Upload tar.gz<br/>with JWT| Ingress
Ingress -->|Stage payload| S3
Ingress -->|Publish msg<br/>service: hccm| Kafka
%% Path 2: Thanos Bridge
Prom -->|Scrape| MCO
MCO -->|Remote write| Thanos
Bridge -->|List OCP sources<br/>cluster_id, org_id| DB
Bridge -->|PromQL queries| Thanos
Bridge -->|Upload tar.gz| S3
Bridge -->|Publish msg<br/>service: hccm| Kafka
%% Shared downstream
Kafka --> Handler
Handler -->|Download payload| S3
Handler -->|Resolve tenant| DB
Handler --> Processor
Both data paths are shown below. Path 1 is the existing CMMO flow for clusters not reporting to Thanos. Path 2 is the new Thanos Bridge flow for Thanos-reporting clusters. Both produce the same Kafka message and payload format; the downstream handler and processor are unchanged.
sequenceDiagram
box Non-Thanos Cluster
participant CMMO as CMMO
end
box ACM-Managed Cluster
participant MCO as MCO metrics-collector
end
box Hub / Cost-onprem
participant Ingress as Insights Ingress
participant Thanos as Thanos Query
participant Bridge as Thanos Bridge (masu)
participant DB as Koku DB
participant S3 as S3 staging
participant Kafka as Kafka
participant Handler as Kafka Msg Handler
participant Processor as Masu Processor
end
Note over CMMO,Ingress: Path 1 — CMMO (hub, non-ACM clusters)
CMMO->>CMMO: Collect Prometheus metrics, build CSVs + manifest, package tar.gz
CMMO->>Ingress: Upload tar.gz (JWT with org_id)
Ingress->>S3: Stage payload
S3-->>Ingress: url
Ingress->>Kafka: Publish (service: hccm, org_id, url, request_id, b64_identity)
Note over MCO,Bridge: Path 2 — Thanos Bridge (ACM-managed clusters)
MCO->>Thanos: Remote write metrics
Note over Bridge: Scheduled
Bridge->>DB: List OCP sources (cluster_id, org_id)
loop Per Thanos-reporting cluster
Bridge->>Thanos: PromQL (pod, node, storage, VM, ROS)
Thanos-->>Bridge: Time series
Bridge->>Bridge: Transform to CMMO-equivalent CSVs + manifest.json
Bridge->>Bridge: Package tar.gz
Bridge->>S3: Upload tar.gz
S3-->>Bridge: url
Bridge->>Kafka: Publish (service: hccm, org_id, url, request_id, b64_identity="")
end
Note over Kafka,Processor: Shared downstream — unchanged
Kafka->>Handler: Consume message
Handler->>S3: Download payload
Handler->>Handler: extract_payload, parse manifest
Handler->>DB: get_source_and_provider_from_cluster_id(cluster_id, org_id)
Handler->>Processor: process_report (unchanged)
-
CMMO on each OCP cluster
- Queries Prometheus (e.g. every 15 min).
- Collects pod CPU/memory, storage (PVC), node/namespace labels, etc.
- Periodically (e.g. every 6 h) builds a tar.gz with
manifest.json+ CSVs (pod_usage, storage_usage, node_labels, namespace_labels, etc.). - Authenticates with JWT (
x-rh-identity) that carries org_id (and optionally account). - POSTs the tar.gz to the ingress URL (content-type e.g.
application/vnd.redhat.hccm.*+tgz).
-
Ingress (insights-ingress-go)
- Validates content-type and service (e.g.
hccm). - Reads org_id (and account) from the JWT (
identity.GetIdentity). - Stores the file in S3-compatible storage (keyed by request_id, org, etc.).
- Generates a presigned URL to that object.
- Publishes to Kafka topic
platform.upload.announcea message containing:request_id,account,org_id,url,service(e.g.hccm), plus size, timestamp, etc. - Response to client: 202 Accepted with request_id and upload metadata.
- Validates content-type and service (e.g.
-
Koku listener
- Consumes
platform.upload.announce, filters by headerservice == "hccm". - For each message: reads org_id from the payload; downloads the tar.gz from the presigned URL; extracts
manifest.jsonand CSV files.
- Consumes
-
Koku processing (masu)
- Parses manifest:
cluster_id,uuid,date,files[]. - Resolves provider:
get_source_and_provider_from_cluster_id(cluster_id, org_id)→Sources+Provider(cluster must be registered in Sources for that org_id). - If no source: logs and skips (no tenant).
- Creates/updates CostUsageReportManifest, copies reports to local dir, create_daily_archives (splits CSVs by day, validates/sanitizes, uploads daily CSVs to S3).
- Line-item processing per report file (OCP report processor), then summarization (Celery).
- Sends validation (success/failure) to
platform.upload.validation.
- Parses manifest:
Important steps after "ingress stores and announces": Kafka → download by URL → extract manifest + CSVs → resolve cluster_id + org_id to Provider → daily archives to S3 → report processing → summarization → validation.
- cost-onprem-chart is deployed on ACM (stolostron).
- ACM already has MCO (multicluster-observability-operator) and Thanos: metrics from all managed clusters are collected (Metrics Collector or MCOA) and written to Thanos (Observatorium API → Thanos Receive → object storage).
- Goal: Remove CMMO and ingress from the cost/ROS data path only for clusters that report their metrics to Thanos (whether via MCO or any other mechanism). Use a new pipeline for those clusters: MCO/Thanos → Thanos Bridge → same contract as today (S3 + Kafka) so Koku stays unchanged. The existing CMMO → insights-ingress path remains available for clusters not managed by ACM and for the ACM hub itself.
- Metrics in Thanos are equivalent in content to what CMMO produces (same logical series/labels so you can reconstruct pod/node/namespace usage and labels).
- A new component runs in the cost-onprem/masu environment; it:
- Reads from Thanos (e.g. Thanos Query),
- Transforms that data into the same CSV + manifest format Koku expects,
- Writes tar.gz (or equivalent) to the same S3 bucket (or same path scheme) that Koku uses,
- Publishes a Kafka message in the same shape as ingress (
platform.upload.announce,service: hccm,request_id,account,org_id,url).
- Koku keeps consuming
platform.upload.announce(servicehccm), downloading by URL, and running existing extraction + provider resolution + processing. No change to the "after Kafka" data flow.
Option A – Thanos → "Thanos-to-Koku" service (recommended baseline)
- New service (e.g. under masu): scheduled (e.g. every 6 h).
- Input: Thanos Query API (PromQL) or Store API; time range and cluster_id (from Thanos external labels).
- Logic: For each (cluster_id, interval): query Thanos for the metrics that correspond to CMMO's pod_usage, storage_usage, node/namespace labels; aggregate/transform into the exact CSV column set Koku uses; build
manifest.json+ CSVs; pack tar.gz; upload to S3; produce Kafka message with presigned URL and org_id (see 2.5). - Deploy: Single instance (or one per tenant/scheduler) in the hub/cost-onprem namespace; no CMMO on managed clusters; no ingress for hccm.
Option B – MCO writes "cost/usage" export to object storage
- Extend MCO (or a sidecar/controller) to periodically export from Thanos (or from the collector) into a defined format (e.g. same CSV schema) in object storage (e.g. same bucket as Thanos or a dedicated bucket).
- New component only: discovers new export files (e.g. by listing or events), moves/copies them to the path Koku expects, generates manifest.json, and publishes Kafka with URL + org_id.
- Pro: Less "query and transform" in the new component. Con: Requires MCO changes and a stable export schema and discovery story.
Option C – Hybrid: MCO addon exposes cost metrics; component queries Thanos
- multicluster-observability-addon (MCOA) (or legacy metrics collector) is extended so that the same metrics CMMO would collect (pod CPU/memory, PVC, node/namespace labels) are included in the allowlist and sent to Thanos (with cluster identity in labels).
- New component is as in Option A: query Thanos → transform → CSV + manifest → S3 + Kafka.
- This minimizes MCO changes (only allowlist + possibly recording rules) and keeps transformation in one place.
Recommendation: Option A + C — extend MCO/MCOA so Thanos has "Koku/ROS" metrics; add a single Thanos-to-Koku service that queries Thanos, produces CSV+manifest, writes S3, and publishes Kafka.
- Metrics:
- Add to MCO's metrics allowlist (or MCOA scrape config) every metric (and label) needed to reconstruct:
- Pod usage: CPU/memory request, limit, usage (e.g.
container_cpu_*,container_memory_*, pod/node/namespace labels). - Storage: PVC/PV usage and capacity.
- Node/namespace labels for cost and ROS.
- Pod usage: CPU/memory request, limit, usage (e.g.
- Map these to the same semantics as CMMO's CSV columns (see koku
docs/architecture/csv-processing-ocp.mdand cost-onprem data-sources doc).
- Add to MCO's metrics allowlist (or MCOA scrape config) every metric (and label) needed to reconstruct:
- Thanos:
- Ensure each series is tagged with a cluster identifier (e.g.
clusterormanaged_clusterfrom MCO) so the new component can slice by cluster and time.
- Ensure each series is tagged with a cluster identifier (e.g.
- multicluster-observability-addon:
- If using MCOA: add the same metrics to the ScrapeConfig (or equivalent) used for user workload / cost; ensure cluster_id is in labels.
- No change to Thanos Receive/Store API; only to what is scraped and forwarded.
- Today: org_id comes from the JWT the CMMO sends to ingress. Ingress puts it in the Kafka message; Koku uses (cluster_id, org_id) to find
SourcesandProvider(get_source_and_provider_from_cluster_id(cluster_id, org_id)). - With MCO: There is no JWT; metrics are pushed by the addon/collector with cluster identity, not org_id.
Design for org_id:
-
Cluster → tenant in Koku DB
- In cost-onprem, Sources (and Provider) already bind cluster_id to org_id (tenant).
- So the authoritative mapping is: cluster_id → org_id from Koku's Sources (and Provider) tables.
-
New component must have org_id before publishing Kafka
- The component runs in the masu/Koku environment and has DB access.
- For each cluster_id it wants to emit a report for:
- Query Koku DB: get
Sources(or Provider) whereprovider.authentication.credentials.cluster_id == cluster_idand take that source's org_id (and optionally account). - If no source: skip that cluster (or log and retry later); do not publish a message without org_id (Koku would drop it anyway).
- Query Koku DB: get
- Publish Kafka with that org_id (and account if present) so the message is identical to what ingress would send.
-
Ordering / bootstrap
- Providers (Sources) must be created before the new component can report for that cluster (e.g. via UI/API when the user "adds" the cluster to Cost Management).
- So: cluster registered in Sources with org_id → component can resolve cluster_id → org_id → publish message. No change to Koku's "unknown organization" or "unexpected OCP report" handling.
-
Optional: ACM as source of org_id
- If ManagedCluster (or a custom resource) ever carries an "org_id" or "tenant" annotation, the component could use that as a hint or fallback, but Koku's source of truth should remain Sources/Provider so that behavior stays consistent with today.
- Location: Implement as a service under masu (same codebase/deployment boundary): same DB, same config, same tenant model.
- Execution:
- Scheduled: e.g. Celery Beat job every N hours (e.g. 6), or a dedicated cron-like pod that runs "Thanos → transform → S3 → Kafka" for a time window.
- Future: Per-tenant schedule (e.g. different intervals per org_id) can be added later via config or DB.
| Area | Change |
|---|---|
| MCO / MCOA | Extend metrics allowlist (and scrape config) so Thanos has Koku/ROS metrics with cluster labels. |
| New component (masu) | Thanos Query → transform to CMMO-like CSV + manifest → upload tar.gz to S3 → publish platform.upload.announce (hccm) with org_id from Koku DB (cluster_id → Sources → org_id). |
| Koku | No change to Kafka consumer or processing; still requires org_id in message and registered Source per cluster. |
| Ingress | Still deployed; not used for Thanos-reporting clusters but remains for hub and non-ACM clusters using CMMO. |
| CMMO | Removed only from clusters reporting to Thanos; remains for hub and non-ACM clusters. |
- Size the new "Thanos-to-Koku" component (CPU/memory) for different cluster and tenant counts.
- Compare aggregate cost of "one MCO + one new component" vs "CMMO on every cluster."
- Work per run: For each (cluster, time_window): query Thanos → build CSVs + manifest → compress → upload S3 → one Kafka message per payload.
- Drivers: Number of clusters, pods/nodes per cluster, time window (e.g. 6 h), report types (pod, storage, labels).
Proposed sizing matrix (requests/limits):
| Clusters | Tenants (org_id) | Pods (total) | CPU request | Memory request | Notes |
|---|---|---|---|---|---|
| 5 | 1–2 | ~500 | 200m | 512 Mi | Dev/small |
| 10 | 2–5 | ~2k | 500m | 1 Gi | Small prod |
| 25 | 5–10 | ~5k | 1 | 2 Gi | Medium |
| 50 | 10–20 | ~10k | 2 | 4 Gi | Large |
| 100+ | 20+ | ~20k+ | 4 | 8 Gi | Scale test / split jobs |
- Execution: Run as scheduled job (CronJob or Celery) so average utilization is low; peak during the run.
- Parallelism: For large cluster counts, process clusters in batches (e.g. 10 at a time) to cap memory and avoid overloading Thanos.
- Current (CMMO): Per-cluster: CMMO pod(s) (CPU/memory) + Prometheus queries + network to ingress. Ingress: 1 deployment for all clusters.
- New (MCO + component): No CMMO on managed clusters; MCO already exists for ACM; new component: one deployment (or CronJob) sized as above.
Rough comparison (for planning only):
| Scenario | CMMO total (N clusters) | New component (single) | Net change |
|---|---|---|---|
| 10 clusters | 10 × (e.g. 100m CPU, 256 Mi) ≈ 1 core, 2.5 Gi | 500m, 1 Gi | Save ~0.5 core, ~1.5 Gi across clusters |
| 50 clusters | 50 × same ≈ 5 core, 12.5 Gi | 2 core, 4 Gi | Save ~3 core, ~8.5 Gi |
| 100 clusters | 100 × same ≈ 10 core, 25 Gi | 4 core, 8 Gi | Save ~6 core, ~17 Gi |
(Exact CMMO numbers should be taken from CMMO's own resource requests/limits and multiplied by N.)
- PoC: Deploy the new component against a Thanos with Koku metrics; run for 1–2 clusters, then 10, then 50; measure peak CPU/memory and run duration per cycle.
- Load/scale tests: Simulate many clusters (e.g. MCO simulator pattern in multicluster-observability-operator
tools/simulator); vary pods per cluster and measure component and Thanos load. - Document: Update resource-requirements.md (and any runbooks) with recommended requests/limits per tier and a formula or table: clusters + tenants + pods → suggested CPU/memory. Note that MCO/Thanos scale separately (see MCO
docs/scale-perf.md); the new component is an additional consumer of Thanos Query.
After agreeing on the architecture, split work as follows. Each bullet can be assigned to a different agent or sprint.
- A1 – Thanos Query API: Document Thanos Query (and Store API) usage: endpoints, auth, time ranges, and label selectors (especially cluster_id). Proof-of-concept: list clusters and query one cluster's metrics for a 6h window.
- A2 – Metrics → CSV mapping: For each CMMO report type (pod_usage, storage_usage, node_labels, namespace_labels), list Prometheus/Thanos metric names and labels; write mapping to Koku CSV columns (reference:
koku/docs/architecture/csv-processing-ocp.md). Identify any gaps (e.g. recording rules). - A3 – Transform + pack: Design or implement: query result → DataFrame/CSV → manifest.json → tar.gz. Validate output against Koku's
parse_manifestand report processor expectations. - A4 – S3 + Kafka producer: Same contract as ingress: upload tar.gz to S3, generate presigned URL, produce
platform.upload.announce(hccm) with org_id from DB. Reuse Koku S3 path conventions and Kafka producer utils.
- B1 – MCO export design: Research whether MCO (or Observatorium) can export "cost/usage" data in a defined schema to object storage. Document APIs or controller changes needed.
- B2 – Discovery and manifest: Design how the new component discovers new export files (list bucket, events, or CRs) and builds manifest.json + Kafka message with org_id.
- C1 – Allowlist for Koku/ROS: In MCO/MCOA repos, list exact metrics (and labels) CMMO uses; add them to metrics_allowlist.yaml (legacy) or ScrapeConfig (MCOA). Verify cluster_id (or equivalent) is present on series.
- C2 – Recording rules (optional): If raw metrics cardinality is too high, design recording rules in MCO that pre-aggregate to a "cost usage" schema and ensure they are stored in Thanos.
- Org_id / tenant: Implement or document DB query: cluster_id → Sources → org_id (and account). Ensure component only publishes for clusters that have a Source; document bootstrap (user adds cluster before data flows).
- Scheduling: Decide Celery Beat vs CronJob; implement scheduled job. Document per-tenant schedule as future work.
- Resource sizing: Run PoC and scale tests; update sizing matrix and resource-requirements.md.
- E2E test: One managed cluster, MCO sending metrics to Thanos; component runs once; Koku consumes message and processes report; verify data in DB and (if applicable) ROS.
- Koku:
koku/masu/external/kafka_msg_handler.py,koku/masu/util/ocp/common.py(get_source_and_provider_from_cluster_id),docs/architecture/csv-processing-ocp.md - Ingress:
insights-ingress-go/internal/upload/upload.go,README.md(message format) - Cost-onprem:
docs/operations/cost-management-data-sources.md,docs/operations/resource-requirements.md - MCO: multicluster-observability-operator README,
docs/scale-perf.md; multicluster-observability-addon - Stolostron: https://github.com/orgs/stolostron/repositories