10:34:23 - Create an endpoint (prefix with wrapped_) to sum all requests, add a parameter to filter by organization_id. Use a query like this:
SELECT * FROM ( SELECT region, organization_id, workspace_id, pipe_id, pipe_name, toStartOfMonth(timestamp) AS timestamp, sumMerge(request_sum) AS request_sum, sumMerge(error_sum) AS error_sum, sumMerge(read_bytes_sum) AS read_bytes_sum, sumMerge(read_rows_sum) AS read_rows_sum FROM mv_pipe_stats_daily WHERE timestamp >= '2025-12-01' AND pipe_id = '<pipe_id>' AND organization_id = '<org_id>' AND workspace_id = '<w_id>' GROUP BY region, organization_id, workspace_id, pipe_id, pipe_name, timestamp UNION ALL SELECT region, organization_id, workspace_id, pipe_id, pipe_name, timestamp, request_sum, error_sum, read_bytes_sum, read_rows_sum FROM historic_monthly_pipe_stats WHERE pipe_id = '<pipe_id>' AND organization_id = '<org_id>' AND workspace_id = '<w_id>' ) ORDER BY timestamp desc
10:56:31 - Modify wrapped_pipe_stats_summary to aggregate requests by query_api and everything else, also keep the total sum of requests
11:06:41 - I need to add the day with max requests, and the number of requests, use mv_pipe_stats_daily to calculate it and add it to the wrapped_pipe_stats_summary pipe
11:11:31 - What's the day with more requests for the organization in the last year and the number of requests
11:17:36 - Add to the endpoint the total storage using raw_storage and the workspace or data source name with more storage
11:31:53 - Add to the endpoint the total ingested data (written_bytes) and breakdown by top 3 ingestion types (kafka, hfi, etc.)
11:36:37 - Do not use raw_datasources_ops_log use an aggregated version, do not filter by event_type, just group by them and limit top 3.
11:38:28 - For datasources_ops we have this option, do the union all filtering by dates and organization_id, do not group by the rest of dimensions:
SELECT * FROM ( SELECT region, organization_id, workspace_id, workspace_name, event_type, timestamp, total_operations, read_rows, read_bytes, written_rows, written_bytes FROM historic_monthly_datasources_ops_log WHERE workspace_id = '<w_id>' AND event_type = 'copy' UNION ALL SELECT region, organization_id, workspace_id, workspace_name, event_type, toStartOfMonth(timestamp) AS timestamp, sumMerge(total_operations), sumMerge(read_rows), sumMerge(read_bytes), sumMerge(written_rows), sumMerge(written_bytes) FROM mv_datasources_ops_log_daily WHERE workspace_id = '<w_id>' AND event_type = 'copy' AND timestamp >= '2025-12-01' GROUP BY region, organization_id, workspace_id, workspace_name, event_type, timestamp ) ORDER BY timestamp desc
11:44:44 - Add the fastest p99 endpoint duration in the last year as well
12:03:46 - Can we calculate the quantile of total number of requests (compared to all requests from all organization) and quantile of row count (both read and write), if needed use only the monthly data sources
12:05:37 - Remove fastest_p99_endpoint and the related column
12:11:45 - The endpoint is timing out, except the historic tables make sure to filter by timestamp >= '2025-01-01'
12:17:25 - Add the total written rows use the historic and daily datasources_ops_log filtering by date as usual
12:24:32 - We need a different approach for the quantiles analysis to avoid reading too much data, use wrapped_active_organizations to filter top 20 orgs by requests and then filter by those organizations, filter by dates so for the monthly, daily data sources it filters by 2025-12-01, use other optimizations as needed
12:33:20 - Is quantile calculation correct?
12:34:37 - I want which quantile/decile the org falls into (e.g., "top 10%", "top 25%")
12:38:29 - Is the percentile calculation right?
13:15:40 - Remove the quantiles calculation
13:19:34 - Add top 3 workspaces names and what percentage of the total read rows represent for the organization, same for total written rows (or bytes)
14:15:08 - Create a table (wrapped_organizations) to map organization_id with other random ID to expose publicly, then create a copy pipe to run once that maps all organization_ids from dim_organizations_all_rt to wrapped organizations, whenever I run the copy pipe it should append only organization_ids not present in wrapped_organizations but do not modify the rest
14:20:37 - Now modify this endpoint wrapped_pipe_stats_summary to receive wrapped_id as parameter (same Array type), instead of the organization_id, take into account we'd need to get the right organization_ids
11:12:40 - Wrapped_pipe_stats_summary is returning the actual metrics multiplied by two, extract the requests_sum calculation to check what's going on
11:12:47 - Query the wrapped_pipe_stats_summary endpoint to examine the requests_sum calculation. Show me the SQL logic in the endpoint node where the final aggregation happens, particularly focusing on how request_sum is being calculated and if there are any UNION ALL operations that might be causing duplication.
11:14:14 - Use this wrapped_id