You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Just The Gist : Snowflake Dashboard on data sharing usage This gist is used for developing building a streamlit based dashboard app on the various data usage views provided by Snowflake. https://docs.snowflake.com/en/sql-reference/data-sharing-usage/ These offers telemetry insights on how consumers are using and adopting the data product or ap…
These offers telemetry insights on how consumers are using and adopting the data product or app. The views are available to the marketplace provider, using which they can gather insights and improve the product they are offering.
Instead of sharing code base; i am adopting the pattern of sharing the project requirement document (PRD); which can be used by AI coding tools like Cortex Code, to generate the executing code. This also gives the user the oppurtunity to make changes as per thier criteria.
File: PRD_DATASHARING.md
This is the project requirement document (PRD) which has been used to generate the streamlit application. The one that is shared here, is a refined version of the original PRD, which was raw and unformatted etc.. But the core aspects of the project is captured here.
You can refer to this doc in the plan mode and ask Cortex Code, to generate the assets for you. Derive your own version, by further modifying/enhancing.
File: AGENTS.md
This file is my own version of various rules that would be followed by the coding agent during the implementation phase. Feel free to adopt or change as you please
Version: 1.0 Purpose: Build a dashboard to help Snowflake Marketplace data providers understand how their listings are being consumed by customers.
1. Introduction
1.1 Background
Snowflake Marketplace enables data providers to share datasets with consumers across Snowflake accounts. However, understanding adoption patterns, consumer behavior, and usage trends requires querying multiple system views and synthesizing disparate metrics.
1.2 Objective
Build an interactive dashboard that provides data providers with actionable insights into:
Which data products are most popular
How consumers engage with listings (views, clicks, installs)
What specific tables and columns are being queried
Trends over time and consumer distribution
1.3 Target Users
Data product managers
Marketplace listing owners
Data engineering teams responsible for shared datasets
2. Data Sources
Snowflake provides usage telemetry through the SNOWFLAKE.DATA_SHARING_USAGE schema. The dashboard should leverage these four views:
The DATA_SHARING_USAGE views can contain years of historical data, making direct queries slow. The recommended approach is to create snapshot tables that contain only the relevant time window.
Example SQL:
-- Create snapshot of consumption data for the past 90 days
CREATE OR REPLACE TRANSIENT TABLE listing_consumption_daily_snapshot ASSELECT*FROMsnowflake.data_sharing_usage.listing_consumption_daily
WHERE event_date >=CURRENT_DATE() -90;
Recommendation: Create transient snapshot tables for all four views. Use a configurable date range (e.g., 90 days for production, 5 days for development/demo).
3.2 JSON Parsing for Access History
Extracting table and column access from the nested JSON requires LATERAL FLATTEN:
-- Extract object and column names from access history
WITH base AS (
SELECT
query_date,
share_name,
consumer_account_name,
f.value:objectName::VARCHARAS object_name,
f.value:columns AS columns
FROM listing_access_history_snapshot AS t,
LATERAL FLATTEN(input =>t.share_objects_accessed) AS f
)
SELECTb.query_date,
b.share_name,
b.consumer_account_name,
b.object_name,
c.value:columnName::VARCHARAS column_name
FROM base AS b,
LATERAL FLATTEN(input =>b.columns) AS c;
Purpose: Understand which specific tables and columns are being queried.
Filters:
Data Product (share name) selector
Consumer selector
KPIs:
Metric
Description
Total Queries
COUNT(DISTINCT query_token)
Unique Consumers
Distinct consumer accounts
Active Days
Days with query activity
Visualizations:
Table & Column Heatmap - When a specific data product is selected:
Y-axis: Table names
X-axis: Column names
Color intensity: Access frequency
Sort by total usage (most accessed at top/left)
Most Accessed Tables - Top 5 list with counts
Most Accessed Columns - Top 5 list with counts
Query Activity Over Time - Bar chart of daily query counts
Top Objects - Bar chart of most accessed objects
Access by Object Type - Pie chart (TABLE, VIEW, etc.)
Consumer Breakdown - Bar chart of queries per consumer
5. Non-Functional Requirements
5.1 Usability
Dashboard should load within 5 seconds when using snapshot tables
All charts should be interactive (hover for details, click to filter where appropriate)
Filters should apply dynamically without page refresh
5.2 Configuration
Database, schema, and connection settings should be configurable (environment variables or config file)
Snapshot table names should be configurable
Date range for analysis should be visible to users
5.3 Code Quality
Separate data loading logic from visualization logic
Use caching where appropriate to avoid redundant queries
Include logging for debugging
6. Sample Implementation Queries
6.1 Get Date Range from Consumption Data
SELECTMIN(event_date) AS min_date,
MAX(event_date) AS max_date,
DATEDIFF(day, MIN(event_date), MAX(event_date)) AS days_span
FROM listing_consumption_daily_snapshot;
6.2 Top Listings by Consumption
SELECT
listing_global_name,
SUM(job_count) AS total_jobs,
COUNT(DISTINCT consumer_account_name) AS unique_consumers
FROM listing_consumption_daily_snapshot
GROUP BY listing_global_name
ORDER BY total_jobs DESCLIMIT10;
6.3 Daily Consumption Trend
SELECT
event_date,
SUM(job_count) AS total_jobs
FROM listing_consumption_daily_snapshot
GROUP BY event_date
ORDER BY event_date;
6.4 Telemetry by Event Type and Action
SELECT
event_type,
action,
SUM(event_count) AS total_events
FROM listing_telemetry_daily_snapshot
GROUP BY event_type, action
ORDER BY total_events DESC;
6.5 Conversion Funnel Metrics
SELECTSUM(CASE WHEN event_type ='LISTING_VIEW' THEN event_count ELSE 0 END) AS views,
SUM(CASE WHEN event_type ='CLICKS' THEN event_count ELSE 0 END) AS clicks,
SUM(CASE WHEN action ='GET_STARTED' THEN event_count ELSE 0 END) AS gets_started,
SUM(CASE WHEN action ='GET_COMPLETED' THEN event_count ELSE 0 END) AS gets_completed
FROM listing_telemetry_daily_snapshot;
6.6 Build Table/Column Access Heatmap Data
WITH parsed_access AS (
SELECT
share_name,
SPLIT_PART(f.value:objectName::VARCHAR, '.', -1) AS table_name,
c.value:columnName::VARCHARAS column_name,
COUNT(*) AS access_count
FROM listing_access_history_snapshot AS t,
LATERAL FLATTEN(input =>t.share_objects_accessed) AS f,
LATERAL FLATTEN(input =>f.value:columns) AS c
WHERE share_name ='<selected_share>'GROUP BY share_name, table_name, column_name
)
SELECT*FROM parsed_access
ORDER BY access_count DESC;
7. Success Criteria
A complete implementation should demonstrate:
Data Pipeline: Successfully creates and queries snapshot tables
Executive View: Home page with KPIs and top listings visualization
Drill-Down Capability: Each page allows filtering by listing
Conversion Insights: Telemetry page shows a meaningful funnel visualization
Granular Access Analysis: Access History page displays table/column heatmap
Interactivity: Charts respond to filter changes; hover shows details
Performance: Pages load quickly using snapshot approach
This PRD provides requirements for building a Snowflake Data Sharing Usage Dashboard. Implementations may vary in UI design, chart library choices, and additional features.
Follow these guidelines as you develop / implement the codebase.
Develop the code using Python
use UV as the library management
For visualization use Streamlit
There is no need to package and distribute the application.
UV environment is already setup and has some libraries already; hence don't go about creating from scratch.
All source should be stored in src/ folder.
Do not create Solution Documentation, unless i specifically ask you too. Stop Wasting Credits un-necessarily
All logs should be stored in the logs/ folder
Specifications would be stored in the spec/ folder
IGNORE the contents in temp/ folder, do not even read or process it. It is a temporary store for my own saving notes, dummy scripts etc..
All tasks & activites should be stored in the file TASKS.md . You should also update the task status as you make progress.
You should keep track of various suggestion ,issues that you had faced ,changes in direction of implementation etc.. in the file HANDOFF.md.
The file HANDOFF.md should be referred for insights & context from previous sessions.
If there is some changes in the plan or direction of implementation, for ex for some unforseeable errors, we should update the TASKS.md accordingly, on the new steps that we will be implementing on.
All data used for testing and demo should be stored in the data/ folder.
Environment and configurations
The various environment specific information would be stored in local '.env' file. Inspect this file and understand various configuration.
Environment Variables hardcoding
Do not embedd or use environment specific values, ex hostname, role, warehouse names etc... in source code or scripts or documentations etc.. Always try to load from the project_config.yml file. If a scenario exists where this is not possible; consult with me on next plan of actions.
Scripting
Prefer to have bash scripts stored in bin/ folder, vs running command one at a time. The output of the various commands in the script should be logged into a file in the log/ folder.
NO Inline Script Execution
DO NOT execute Python code inline using any of these patterns:
python -c "..."
python3 -c "..."
uv run python -c "..."
Heredoc patterns like python << EOF ... EOF
INSTEAD, always:
Create a proper Python script file in src/ or bin/ folder
Run the script using uv run python path/to/script.py
This applies even for "quick" one-off tasks like:
Modifying JSON/notebook files
Checking file contents
Data transformations
Any multi-line Python logic
Exception: Simple single-line expressions for verification are acceptable, e.g., python3 -c "print('hello')"
Implementation aspects
IMPORTANT This code base is a demonstration for self learning only. Hence keep the implementation simple and understandable and easy to read. I prefer for now not to worry about:
Security
vigorous exception handling
test code
performance etc..
If in doubt, ask yourself the question, will this code be understandable by a junior developer? Keep it simple.
DisAllowed Commands
The following commands are not allowed; I want you to confirm with me before proceeding:
rm and its options
rmdir
DROP database, compute pools and other Snowflake objects that are not Table or Views