Skip to content

Instantly share code, notes, and snippets.

@alrocar
Created January 13, 2026 20:09
Show Gist options
  • Select an option

  • Save alrocar/650f5e23109f771be4a712dfec95e3b4 to your computer and use it in GitHub Desktop.

Select an option

Save alrocar/650f5e23109f771be4a712dfec95e3b4 to your computer and use it in GitHub Desktop.
Tinybird Code prompt session for Tinybird Wrapped

User Prompts Timeline

December 12, 2025

10:34:23 - Create an endpoint (prefix with wrapped_) to sum all requests, add a parameter to filter by organization_id. Use a query like this:

SELECT * FROM ( SELECT region, organization_id, workspace_id, pipe_id, pipe_name, toStartOfMonth(timestamp) AS timestamp, sumMerge(request_sum) AS request_sum, sumMerge(error_sum) AS error_sum, sumMerge(read_bytes_sum) AS read_bytes_sum, sumMerge(read_rows_sum) AS read_rows_sum FROM mv_pipe_stats_daily WHERE timestamp >= '2025-12-01' AND pipe_id = '<pipe_id>' AND organization_id = '<org_id>' AND workspace_id = '<w_id>' GROUP BY region, organization_id, workspace_id, pipe_id, pipe_name, timestamp UNION ALL SELECT region, organization_id, workspace_id, pipe_id, pipe_name, timestamp, request_sum, error_sum, read_bytes_sum, read_rows_sum FROM historic_monthly_pipe_stats WHERE pipe_id = '<pipe_id>' AND organization_id = '<org_id>' AND workspace_id = '<w_id>' ) ORDER BY timestamp desc

10:56:31 - Modify wrapped_pipe_stats_summary to aggregate requests by query_api and everything else, also keep the total sum of requests

11:06:41 - I need to add the day with max requests, and the number of requests, use mv_pipe_stats_daily to calculate it and add it to the wrapped_pipe_stats_summary pipe

11:11:31 - What's the day with more requests for the organization in the last year and the number of requests

11:17:36 - Add to the endpoint the total storage using raw_storage and the workspace or data source name with more storage

11:31:53 - Add to the endpoint the total ingested data (written_bytes) and breakdown by top 3 ingestion types (kafka, hfi, etc.)

11:36:37 - Do not use raw_datasources_ops_log use an aggregated version, do not filter by event_type, just group by them and limit top 3.

11:38:28 - For datasources_ops we have this option, do the union all filtering by dates and organization_id, do not group by the rest of dimensions:

SELECT * FROM ( SELECT region, organization_id, workspace_id, workspace_name, event_type, timestamp, total_operations, read_rows, read_bytes, written_rows, written_bytes FROM historic_monthly_datasources_ops_log WHERE workspace_id = '<w_id>' AND event_type = 'copy' UNION ALL SELECT region, organization_id, workspace_id, workspace_name, event_type, toStartOfMonth(timestamp) AS timestamp, sumMerge(total_operations), sumMerge(read_rows), sumMerge(read_bytes), sumMerge(written_rows), sumMerge(written_bytes) FROM mv_datasources_ops_log_daily WHERE workspace_id = '<w_id>' AND event_type = 'copy' AND timestamp >= '2025-12-01' GROUP BY region, organization_id, workspace_id, workspace_name, event_type, timestamp ) ORDER BY timestamp desc

11:44:44 - Add the fastest p99 endpoint duration in the last year as well

12:03:46 - Can we calculate the quantile of total number of requests (compared to all requests from all organization) and quantile of row count (both read and write), if needed use only the monthly data sources

12:05:37 - Remove fastest_p99_endpoint and the related column

12:11:45 - The endpoint is timing out, except the historic tables make sure to filter by timestamp >= '2025-01-01'

12:17:25 - Add the total written rows use the historic and daily datasources_ops_log filtering by date as usual

12:24:32 - We need a different approach for the quantiles analysis to avoid reading too much data, use wrapped_active_organizations to filter top 20 orgs by requests and then filter by those organizations, filter by dates so for the monthly, daily data sources it filters by 2025-12-01, use other optimizations as needed

12:33:20 - Is quantile calculation correct?

12:34:37 - I want which quantile/decile the org falls into (e.g., "top 10%", "top 25%")

12:38:29 - Is the percentile calculation right?

13:15:40 - Remove the quantiles calculation

13:19:34 - Add top 3 workspaces names and what percentage of the total read rows represent for the organization, same for total written rows (or bytes)

December 18, 2025

14:15:08 - Create a table (wrapped_organizations) to map organization_id with other random ID to expose publicly, then create a copy pipe to run once that maps all organization_ids from dim_organizations_all_rt to wrapped organizations, whenever I run the copy pipe it should append only organization_ids not present in wrapped_organizations but do not modify the rest

14:20:37 - Now modify this endpoint wrapped_pipe_stats_summary to receive wrapped_id as parameter (same Array type), instead of the organization_id, take into account we'd need to get the right organization_ids

December 19, 2025

11:12:40 - Wrapped_pipe_stats_summary is returning the actual metrics multiplied by two, extract the requests_sum calculation to check what's going on

11:12:47 - Query the wrapped_pipe_stats_summary endpoint to examine the requests_sum calculation. Show me the SQL logic in the endpoint node where the final aggregation happens, particularly focusing on how request_sum is being calculated and if there are any UNION ALL operations that might be causing duplication.

11:14:14 - Use this wrapped_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment