sfc-gh-vsekar/1_just_the_gist_datasharing_dashboard.md

## 1_just_the_gist_datasharing_dashboard.md

      
    Raw
  

              1_just_the_gist_datasharing_dashboard.md
            
          
    Just The Gist : Snowflake Dashboard on data sharing usage

This gist is used for developing building a streamlit based dashboard app on the various data usage views provided by Snowflake.
https://docs.snowflake.com/en/sql-reference/data-sharing-usage/
These offers telemetry insights on how consumers are using and adopting the data product or app. The views are available to the marketplace provider, using which they can gather insights and improve the product they are offering.
Instead of sharing code base; i am adopting the pattern of sharing the project requirement document (PRD); which can be used by AI coding tools like Cortex Code, to generate the executing code. This also gives the user the oppurtunity to make changes as per thier criteria.
File: PRD_DATASHARING.md

This is the project requirement document (PRD) which has been used to generate the streamlit application. The one that is shared here, is a refined version of the original PRD, which was raw and unformatted etc.. But the core aspects of the project is captured here.
You can refer to this doc in the plan mode and ask Cortex Code, to generate the assets for you. Derive your own version, by further modifying/enhancing.

File: AGENTS.md

This file is my own version of various rules that would be followed by the coding agent during the implementation phase. Feel free to adopt or change as you please

  
## 2_PRD_ARTICLE.md

      
    Raw
  

              2_PRD_ARTICLE.md
            
          
    Product Requirements Document

Snowflake Data Sharing Usage Dashboard

Version: 1.0

Purpose: Build a dashboard to help Snowflake Marketplace data providers understand how their listings are being consumed by customers.

1. Introduction

1.1 Background

Snowflake Marketplace enables data providers to share datasets with consumers across Snowflake accounts. However, understanding adoption patterns, consumer behavior, and usage trends requires querying multiple system views and synthesizing disparate metrics.
1.2 Objective

Build an interactive dashboard that provides data providers with actionable insights into:

Which data products are most popular
How consumers engage with listings (views, clicks, installs)
What specific tables and columns are being queried
Trends over time and consumer distribution

1.3 Target Users


Data product managers
Marketplace listing owners
Data engineering teams responsible for shared datasets


2. Data Sources

Snowflake provides usage telemetry through the SNOWFLAKE.DATA_SHARING_USAGE schema. The dashboard should leverage these four views:
2.1 LISTING_CONSUMPTION_DAILY

Tracks daily job/query activity by consumers.


Column
Description


EXCHANGE_NAME
Marketplace exchange


LISTING_NAME
Display name of the listing


LISTING_GLOBAL_NAME
Unique global identifier


SNOWFLAKE_REGION
Consumer's Snowflake region


CONSUMER_ACCOUNT_LOCATOR
Consumer account identifier


CONSUMER_ACCOUNT_NAME
Consumer account name


EVENT_DATE
Date of activity


JOB_COUNT
Number of jobs/queries executed


DAU/WAU/MAU_28
Daily/Weekly/Monthly active users


Documentation: https://docs.snowflake.com/en/sql-reference/data-sharing-usage/listing-consumption-daily
2.2 LISTING_TELEMETRY_DAILY

Tracks engagement events (views, clicks, get/request actions).


Column
Description


LISTING_NAME
Display name of the listing


LISTING_GLOBAL_NAME
Unique global identifier


EVENT_DATE
Date of event


EVENT_TYPE
Category: CLICKS, LISTING_AUTOFULFILMENT, etc.


ACTION
Specific action: GET_STARTED, GET_COMPLETED, etc.


EVENT_COUNT
Number of events


CONSUMER_ACCOUNTS_REACHED_COUNT
Unique consumers reached


Documentation: https://docs.snowflake.com/en/sql-reference/data-sharing-usage/listing-telemetry-daily
2.3 LISTING_EVENTS_DAILY

Tracks consumer lifecycle events (installs, uninstalls, purchase requests).


Column
Description


LISTING_DISPLAY_NAME
Display name of the listing


LISTING_GLOBAL_NAME
Unique global identifier


EVENT_DATE
Date of event


EVENT_TYPE
GET, REQUEST, PURCHASE, UNINSTALL


SNOWFLAKE_REGION
Consumer's region


CONSUMER_ACCOUNT_LOCATOR
Consumer account identifier


CONSUMER_ACCOUNT_NAME
Consumer account name


CONSUMER_EMAIL_HASH
Hashed consumer email


Documentation: https://docs.snowflake.com/en/sql-reference/data-sharing-usage/listing-events-daily
2.4 LISTING_ACCESS_HISTORY

Provides object and column-level query access details.


Column
Description


QUERY_DATE
Date of query


QUERY_TOKEN
Unique query identifier


LISTING_GLOBAL_NAME
Unique global identifier


SHARE_NAME
Name of the share


CONSUMER_ACCOUNT_LOCATOR
Consumer account identifier


CONSUMER_ACCOUNT_NAME
Consumer account name


SHARE_OBJECTS_ACCESSED
JSON array of accessed objects/columns


Important: The SHARE_OBJECTS_ACCESSED column contains nested JSON that must be parsed to extract table and column names:
[
  {
    "objectName": "DB.SCHEMA.TABLE_NAME",
    "columns": [
      {"columnName": "COLUMN_A"},
      {"columnName": "COLUMN_B"}
    ]
  }
]
Documentation: https://docs.snowflake.com/en/sql-reference/data-sharing-usage/listing-access-history

3. Technical Considerations

3.1 Performance: Snapshot Approach

The DATA_SHARING_USAGE views can contain years of historical data, making direct queries slow. The recommended approach is to create snapshot tables that contain only the relevant time window.
Example SQL:
-- Create snapshot of consumption data for the past 90 days
CREATE OR REPLACE TRANSIENT TABLE listing_consumption_daily_snapshot AS
SELECT *
FROM snowflake.data_sharing_usage.listing_consumption_daily
WHERE event_date >= CURRENT_DATE() - 90;
Recommendation: Create transient snapshot tables for all four views. Use a configurable date range (e.g., 90 days for production, 5 days for development/demo).
3.2 JSON Parsing for Access History

Extracting table and column access from the nested JSON requires LATERAL FLATTEN:
-- Extract object and column names from access history
WITH base AS (
  SELECT 
    query_date,
    share_name,
    consumer_account_name,
    f.value:objectName::VARCHAR AS object_name,
    f.value:columns AS columns
  FROM listing_access_history_snapshot AS t,
       LATERAL FLATTEN(input => t.share_objects_accessed) AS f
)
SELECT 
  b.query_date,
  b.share_name,
  b.consumer_account_name,
  b.object_name,
  c.value:columnName::VARCHAR AS column_name
FROM base AS b,
     LATERAL FLATTEN(input => b.columns) AS c;
3.3 Recommended Tech Stack


Language: Python
Visualization: Streamlit (or alternative: Dash, Panel, Gradio)
Snowflake Connectivity: Snowpark or snowflake-connector-python
Package Management: UV, Poetry, or pip


4. Functional Requirements

4.1 Page 1: Executive Overview (Home)

Purpose: Provide a high-level summary of all data sharing activity.
KPIs to Display:


Metric
Calculation


Total Listings
COUNT(DISTINCT listing_global_name) across all views


Total Consumers
COUNT(DISTINCT consumer_account_name) from consumption


Total Jobs
SUM(job_count) from consumption


Total Events
SUM(event_count) from telemetry


Visualizations:

Top Data Products - Bar chart showing top 5-10 listings by total job count, with indicator of unique consumer count
Consumption Trend - Line/area chart of daily job counts over the analysis period
Telemetry Breakdown - Sunburst or treemap showing EVENT_TYPE → ACTION hierarchy

Interactivity: Display the analysis date range prominently.

4.2 Page 2: Consumption Analysis

Purpose: Analyze job/query activity patterns by listing and consumer.
Filters:

Listing selector (dropdown)

KPIs:


Metric
Description


Total Jobs
Sum of job_count for selected listing


Unique Consumers
Distinct consumer accounts


Active Days
Days with at least one job


Avg Daily Users
Average DAU


Visualizations:

Jobs Over Time - Area/bar chart of daily job counts
Top Consumers - Bar chart of top 10 consumers by job count
User Engagement - Display 1-day, 7-day, and 28-day unique user metrics
Regional Distribution - Pie/donut chart of jobs by Snowflake region


4.3 Page 3: Telemetry & Conversion Funnel

Purpose: Understand engagement patterns and conversion rates.
Filters:

Listing selector (dropdown)

KPIs:


Metric
Description


Total Events
Sum of event_count


Consumer Touches
Unique consumer accounts reached


Active Listings
Listings with telemetry data


Visualizations:

Event Breakdown - Treemap showing EVENT_TYPE → ACTION hierarchy with counts
Conversion Funnel - Funnel chart showing progression:

Listing Views → Clicks → GET Started → GET Completed
Include conversion percentages between stages


Trend by Event Type - Multi-line chart showing event types over time
Click-Through Rate Table - Table showing per-listing metrics:

Clicks, Gets Started, Gets Completed, CTR %


4.4 Page 4: Events Timeline

Purpose: Track consumer lifecycle events (installs, uninstalls, requests).
Filters:

Listing selector (dropdown)
Event type selector (multi-select)

KPIs:


Metric
Description


Total Events
Count of all events


GETs (Installs)
Count of GET events


Uninstalls
Count of UNINSTALL events


Requests
Count of REQUEST events


Visualizations:

Event Timeline - Stacked bar chart by date, colored by event type
Events by Type - Pie chart of event distribution
Events by Listing - Bar chart of top listings by event count
Consumer Activity Matrix - Pivot table showing event counts per consumer by event type
Recent Events Table - Sortable table showing recent events with details (date, type, listing, consumer, region)


4.5 Page 5: Access History & Usage Heatmap

Purpose: Understand which specific tables and columns are being queried.
Filters:

Data Product (share name) selector
Consumer selector

KPIs:


Metric
Description


Total Queries
COUNT(DISTINCT query_token)


Unique Consumers
Distinct consumer accounts


Active Days
Days with query activity


Visualizations:

Table & Column Heatmap - When a specific data product is selected:

Y-axis: Table names
X-axis: Column names
Color intensity: Access frequency
Sort by total usage (most accessed at top/left)


Most Accessed Tables - Top 5 list with counts
Most Accessed Columns - Top 5 list with counts
Query Activity Over Time - Bar chart of daily query counts
Top Objects - Bar chart of most accessed objects
Access by Object Type - Pie chart (TABLE, VIEW, etc.)
Consumer Breakdown - Bar chart of queries per consumer


5. Non-Functional Requirements

5.1 Usability


Dashboard should load within 5 seconds when using snapshot tables
All charts should be interactive (hover for details, click to filter where appropriate)
Filters should apply dynamically without page refresh

5.2 Configuration


Database, schema, and connection settings should be configurable (environment variables or config file)
Snapshot table names should be configurable
Date range for analysis should be visible to users

5.3 Code Quality


Separate data loading logic from visualization logic
Use caching where appropriate to avoid redundant queries
Include logging for debugging


6. Sample Implementation Queries

6.1 Get Date Range from Consumption Data

SELECT 
    MIN(event_date) AS min_date,
    MAX(event_date) AS max_date,
    DATEDIFF(day, MIN(event_date), MAX(event_date)) AS days_span
FROM listing_consumption_daily_snapshot;
6.2 Top Listings by Consumption

SELECT 
    listing_global_name,
    SUM(job_count) AS total_jobs,
    COUNT(DISTINCT consumer_account_name) AS unique_consumers
FROM listing_consumption_daily_snapshot
GROUP BY listing_global_name
ORDER BY total_jobs DESC
LIMIT 10;
6.3 Daily Consumption Trend

SELECT 
    event_date,
    SUM(job_count) AS total_jobs
FROM listing_consumption_daily_snapshot
GROUP BY event_date
ORDER BY event_date;
6.4 Telemetry by Event Type and Action

SELECT 
    event_type,
    action,
    SUM(event_count) AS total_events
FROM listing_telemetry_daily_snapshot
GROUP BY event_type, action
ORDER BY total_events DESC;
6.5 Conversion Funnel Metrics

SELECT 
    SUM(CASE WHEN event_type = 'LISTING_VIEW' THEN event_count ELSE 0 END) AS views,
    SUM(CASE WHEN event_type = 'CLICKS' THEN event_count ELSE 0 END) AS clicks,
    SUM(CASE WHEN action = 'GET_STARTED' THEN event_count ELSE 0 END) AS gets_started,
    SUM(CASE WHEN action = 'GET_COMPLETED' THEN event_count ELSE 0 END) AS gets_completed
FROM listing_telemetry_daily_snapshot;
6.6 Build Table/Column Access Heatmap Data

WITH parsed_access AS (
    SELECT 
        share_name,
        SPLIT_PART(f.value:objectName::VARCHAR, '.', -1) AS table_name,
        c.value:columnName::VARCHAR AS column_name,
        COUNT(*) AS access_count
    FROM listing_access_history_snapshot AS t,
         LATERAL FLATTEN(input => t.share_objects_accessed) AS f,
         LATERAL FLATTEN(input => f.value:columns) AS c
    WHERE share_name = '<selected_share>'
    GROUP BY share_name, table_name, column_name
)
SELECT * FROM parsed_access
ORDER BY access_count DESC;

7. Success Criteria

A complete implementation should demonstrate:

Data Pipeline: Successfully creates and queries snapshot tables
Executive View: Home page with KPIs and top listings visualization
Drill-Down Capability: Each page allows filtering by listing
Conversion Insights: Telemetry page shows a meaningful funnel visualization
Granular Access Analysis: Access History page displays table/column heatmap
Interactivity: Charts respond to filter changes; hover shows details
Performance: Pages load quickly using snapshot approach


8. Out of Scope

The following are explicitly not required:

User authentication or role-based access
Real-time streaming updates
Automated snapshot refresh scheduling
Export functionality
Mobile optimization
Multi-language support


9. References


Snowflake Data Sharing Usage Documentation
Streamlit Documentation
Snowpark Python Documentation


This PRD provides requirements for building a Snowflake Data Sharing Usage Dashboard. Implementations may vary in UI design, chart library choices, and additional features.

  
## 3_AGENTS.md

      
    Raw
  

              3_AGENTS.md
            
          
    Development guidelines

Follow these guidelines as you develop / implement the codebase.

Develop the code using Python
use UV as the library management
For visualization use Streamlit
There is no need to package and distribute the application.
UV environment is already setup and has some libraries already; hence don't go about creating from scratch.
All source should be stored in src/ folder.
Do not create Solution Documentation, unless i specifically ask you too. Stop Wasting Credits un-necessarily
All logs should be stored in the logs/ folder
Specifications would be stored in the spec/ folder
IGNORE the contents in temp/ folder, do not even read or process it. It is a temporary store for my own saving notes, dummy scripts etc..
All tasks & activites should be stored in the file TASKS.md . You should also update the task status as you make progress.
You should keep track of various suggestion ,issues that you had faced ,changes in direction of implementation etc.. in the file HANDOFF.md.
The file HANDOFF.md should be referred for insights & context from previous sessions.
If there is some changes in the plan or direction of implementation, for ex for some unforseeable errors, we should update the TASKS.md accordingly, on the new steps that we will be implementing on.
All data used for testing and demo should be stored in the data/ folder.

Environment and configurations

The various environment specific information would be stored in local '.env' file. Inspect this file and understand various configuration.
Environment Variables hardcoding

Do not embedd or use environment specific values, ex hostname, role, warehouse names etc... in source code or scripts or documentations etc.. Always try to load from the project_config.yml file. If a scenario exists where this is not possible; consult with me on next plan of actions.
Scripting


Prefer to have bash scripts stored in bin/ folder, vs running command one at a time. The output of the various commands in the script should be logged into a file in the log/ folder.

NO Inline Script Execution

DO NOT execute Python code inline using any of these patterns:

python -c "..."
python3 -c "..."
uv run python -c "..."
Heredoc patterns like python << EOF ... EOF

INSTEAD, always:

Create a proper Python script file in src/ or bin/ folder
Run the script using uv run python path/to/script.py

This applies even for "quick" one-off tasks like:

Modifying JSON/notebook files
Checking file contents
Data transformations
Any multi-line Python logic

Exception: Simple single-line expressions for verification are acceptable, e.g., python3 -c "print('hello')"
Implementation aspects

IMPORTANT This code base is a demonstration for self learning only. Hence keep the implementation simple and understandable and easy to read. I prefer for now not to worry about:

Security
vigorous exception handling
test code
performance etc..

If in doubt, ask yourself the question, will this code be understandable by a junior developer? Keep it simple.
DisAllowed Commands

The following commands are not allowed; I want you to confirm with me before proceeding:

rm and its options
rmdir
DROP database, compute pools and other Snowflake objects that are not Table or Views
Column	Description
EXCHANGE_NAME	Marketplace exchange
LISTING_NAME	Display name of the listing
LISTING_GLOBAL_NAME	Unique global identifier
SNOWFLAKE_REGION	Consumer's Snowflake region
CONSUMER_ACCOUNT_LOCATOR	Consumer account identifier
CONSUMER_ACCOUNT_NAME	Consumer account name
EVENT_DATE	Date of activity
JOB_COUNT	Number of jobs/queries executed
DAU/WAU/MAU_28	Daily/Weekly/Monthly active users
Column	Description
LISTING_DISPLAY_NAME	Display name of the listing
LISTING_GLOBAL_NAME	Unique global identifier
EVENT_DATE	Date of event
EVENT_TYPE	GET, REQUEST, PURCHASE, UNINSTALL
SNOWFLAKE_REGION	Consumer's region
CONSUMER_ACCOUNT_LOCATOR	Consumer account identifier
CONSUMER_ACCOUNT_NAME	Consumer account name
CONSUMER_EMAIL_HASH	Hashed consumer email
Column	Description
QUERY_DATE	Date of query
QUERY_TOKEN	Unique query identifier
LISTING_GLOBAL_NAME	Unique global identifier
SHARE_NAME	Name of the share
CONSUMER_ACCOUNT_LOCATOR	Consumer account identifier
CONSUMER_ACCOUNT_NAME	Consumer account name
SHARE_OBJECTS_ACCESSED	JSON array of accessed objects/columns
Metric	Calculation
Total Listings	COUNT(DISTINCT listing_global_name) across all views
Total Consumers	COUNT(DISTINCT consumer_account_name) from consumption
Total Jobs	SUM(job_count) from consumption
Total Events	SUM(event_count) from telemetry
Metric	Description
Total Jobs	Sum of job_count for selected listing
Unique Consumers	Distinct consumer accounts
Active Days	Days with at least one job
Avg Daily Users	Average DAU
Metric	Description
Total Events	Sum of event_count
Consumer Touches	Unique consumer accounts reached
Active Listings	Listings with telemetry data
Metric	Description
Total Events	Count of all events
GETs (Installs)	Count of GET events
Uninstalls	Count of UNINSTALL events
Requests	Count of REQUEST events
Metric	Description
Total Queries	COUNT(DISTINCT query_token)
Unique Consumers	Distinct consumer accounts
Active Days	Days with query activity