This guide explains how to manually import historical analytics events into OpenPanel using TypeScript. Use this when you need full control over the import process or have a custom data format.
- Introduction
- Understanding the Event Schema
- Session Reconstruction Logic
- Session Aggregation Table
- Inserting into ClickHouse
- Complete Example
- Important Considerations
- Testing & Validation
OpenPanel stores analytics data in ClickHouse using two main tables:
events: Individual user interactions (page views, clicks, custom events)sessions: Aggregated view of user sessions (grouped events with 30-minute timeout)
Each event is linked to a device (anonymous visitor), a profile (identified user), and a session (group of related events).
Use this guide when:
- When you're self-hosting and your data is not from a common provider
- Node.js 18+ with TypeScript
- Access to your OpenPanel ClickHouse instance
- Your OpenPanel project ID
- Historical event data in CSV, JSON, or similar format
OpenPanel events follow a specific schema in ClickHouse. Here's what each field means and how to populate it:
These fields establish user and session identity:
interface EventIdentity {
// Anonymous device identifier (like a browser cookie)
// Use a consistent hash of IP + User-Agent, or your platform's visitor ID
device_id: string;
// User identifier for logged-in users
// Set to device_id if user is anonymous
// Set to actual user ID (email, UUID, etc.) if user is identified
profile_id: string;
// Session identifier - groups related events together
// Generate using UUID v4, must be consistent for events in same session
// See "Session Reconstruction Logic" section for how to assign this
session_id: string;
// Your OpenPanel project ID (get from Settings > Details)
project_id: string;
}Key relationships:
device_idtracks anonymous visitors across sessionsprofile_idtracks identified users (can be same asdevice_idfor anonymous users)session_idgroups events within a 30-minute window- Multiple sessions can have the same
device_idorprofile_id
interface CoreEventFields {
// Unique event ID - generate with UUID v4
id: string;
// Event name (e.g., "screen_view", "button_click", "purchase")
// Use "screen_view" for page views
// Use "session_start" and "session_end" for session boundaries
name: string;
// Event timestamp in ISO format: "2024-01-15 14:30:00"
// ClickHouse expects: YYYY-MM-DD HH:MM:SS format (no T, no Z)
created_at: string;
// URL path without domain (e.g., "/blog/article-1")
path: string;
// Full origin URL (e.g., "https://example.com")
origin: string;
// Where the user came from (full URL)
// Set to empty string if direct traffic or same domain
referrer: string;
// Human-readable referrer source (e.g., "Google", "Facebook", "Direct")
referrer_name: string;
// Referrer category: "search", "social", "referral", "email", "paid", "direct"
referrer_type: string;
// Event duration in milliseconds (usually 0 for most events)
// For session_end events, set to: last_event_time - first_event_time
duration: number;
// Custom event data as flat key-value object
// Nested objects are flattened using dot notation
// Example: { "button_id": "signup", "user.plan": "premium", "price": 99.99 }
properties: Record<string, string | number | boolean | null>;
}The properties field is a flexible key-value store for custom event data. Understanding its structure is crucial for proper data import.
// Properties is always a flat object - nested data uses dot notation
properties: {
"button_id": "signup", // Simple string
"price": 99.99, // Number
"is_premium": true, // Boolean
"user.name": "John", // Nested using dot notation
"metadata.source": "mobile_app" // Multi-level nesting
}Important: ClickHouse stores properties as a flat key-value object. If your source data has nested objects, flatten them using dot notation:
// β Wrong: Nested objects won't work in ClickHouse
properties: {
user: {
name: "John",
plan: "premium"
}
}
// β
Correct: Flatten using dot notation
properties: {
"user.name": "John",
"user.plan": "premium"
}
// Helper function to flatten objects
function flattenObject(obj: any, prefix = ''): Record<string, string | number | boolean | null> {
const flattened: Record<string, any> = {};
for (const [key, value] of Object.entries(obj)) {
const newKey = prefix ? `${prefix}.${key}` : key;
if (value && typeof value === 'object' && !Array.isArray(value)) {
Object.assign(flattened, flattenObject(value, newKey));
} else {
flattened[newKey] = value;
}
}
return flattened;
}OpenPanel uses specific property keys for special purposes. Always prefix these with __ (double underscore):
properties: {
// URL query parameters (extracted from page URL)
"__query": {
"utm_source": "google",
"utm_medium": "cpc",
"utm_campaign": "summer_sale",
"ref": "newsletter"
},
// Page metadata
"__title": "Product Page - Item XYZ",
"__screen": "1920x1080",
"__language": "en-US",
// Your custom properties (no prefix needed)
"product_id": "xyz-123",
"category": "electronics"
}Properties you SHOULD include for imports:
__query: URL query parameters (essential for attribution)__title: Page title__screen: Screen resolution__language: User language- Any custom event properties
Properties you should NOT include (these are auto-generated by real-time system):
__path,__referrer: Use top-levelpath,origin,referrerfields instead__user_agent: Use top-level device/browser fields instead__hash: URL hash fragment (not typically preserved in imports)__reqId: Internal request tracking ID
OpenPanel stores URL query parameters in a special __query nested object within properties. This is essential for marketing attribution:
// Example: User visits from Google Ads
// URL: https://example.com/product?utm_source=google&utm_medium=cpc&utm_campaign=sale&product_id=123
properties: {
"__query": {
"utm_source": "google", // Traffic source
"utm_medium": "cpc", // Marketing medium
"utm_campaign": "sale", // Campaign name
"utm_content": "banner_ad", // Ad content (optional)
"utm_term": "shoes", // Search term (optional)
"product_id": "123" // Other query params
}
}How to extract query parameters from URLs:
function extractQueryParams(url: string): Record<string, string> {
try {
const urlObj = new URL(url);
const params: Record<string, string> = {};
// Extract all query parameters
for (const [key, value] of urlObj.searchParams.entries()) {
params[key] = value;
}
return params;
} catch {
return {};
}
}
// Usage in event transformation
const query = extractQueryParams(raw.page_url);
const properties: Record<string, any> = {};
// Store query params if present
if (Object.keys(query).length > 0) {
properties.__query = query;
}
// Alternative: Store individual UTM params at root level (they'll be nested automatically)
// This approach matches the Umami provider pattern
if (query.utm_source) {
if (!properties.__query) properties.__query = {};
properties.__query.utm_source = query.utm_source;
}
if (query.utm_medium) {
if (!properties.__query) properties.__query = {};
properties.__query.utm_medium = query.utm_medium;
}
// ... repeat for utm_campaign, utm_content, utm_termUTM parameters are critical for OpenPanel's attribution system. The sessions table automatically extracts these for aggregation:
// Session aggregate will have top-level UTM fields
interface SessionAggregate {
// ... other fields
// These are extracted from the first event's properties.__query
utm_source: string; // From properties.__query.utm_source
utm_medium: string; // From properties.__query.utm_medium
utm_campaign: string; // From properties.__query.utm_campaign
utm_content: string; // From properties.__query.utm_content
utm_term: string; // From properties.__query.utm_term
}
// Helper function to extract UTM from properties
function extractUTMFromProperties(properties: Record<string, any>): {
utm_source: string;
utm_medium: string;
utm_campaign: string;
utm_content: string;
utm_term: string;
} {
const query = properties.__query || {};
return {
utm_source: String(query.utm_source || ''),
utm_medium: String(query.utm_medium || ''),
utm_campaign: String(query.utm_campaign || ''),
utm_content: String(query.utm_content || ''),
utm_term: String(query.utm_term || ''),
};
}Here are recommended patterns for different types of events:
// Page view event
{
name: "screen_view",
properties: {
"__query": { "utm_source": "google", "ref": "homepage" },
"__title": "Product Details",
"__screen": "1920x1080",
"__language": "en-US"
}
}
// E-commerce event
{
name: "purchase",
properties: {
"order_id": "ORD-12345",
"revenue": 149.99,
"currency": "USD",
"items": "3", // Store count, not array
"item.1.id": "PROD-A", // Use dot notation for item details
"item.1.price": 49.99,
"item.2.id": "PROD-B",
"item.2.price": 100.00
}
}
// Custom interaction event
{
name: "button_click",
properties: {
"button_id": "signup_cta",
"button_text": "Start Free Trial",
"page_section": "hero",
"user.is_logged_in": false
}
}Location and device information for analytics:
interface GeoDeviceFields {
// Geographic location
country: string; // ISO country code (e.g., "US", "GB")
city: string; // City name
region: string; // State/province code (e.g., "CA", "NY")
longitude: number | null; // GPS coordinates
latitude: number | null;
// Device information
os: string; // Operating system (e.g., "Windows", "macOS", "iOS")
os_version: string; // OS version (e.g., "11.0")
browser: string; // Browser name (e.g., "Chrome", "Safari", "Firefox")
browser_version: string; // Browser version (e.g., "120.0")
device: string; // Device type: "desktop", "mobile", "tablet", "smarttv"
brand: string; // Device brand (e.g., "Apple", "Samsung")
model: string; // Device model (e.g., "iPhone 14 Pro")
}interface MetadataFields {
// Timestamp when this event was imported (ISO format)
imported_at: string;
// SDK identifier (use "manual-import" or your platform name)
sdk_name: string;
// SDK version (use "1.0.0" or your import script version)
sdk_version: string;
}interface OpenPanelEvent {
// Identity
id: string;
device_id: string;
profile_id: string;
session_id: string;
project_id: string;
// Core
name: string;
created_at: string;
path: string;
origin: string;
referrer: string;
referrer_name: string;
referrer_type: string;
duration: number;
properties: Record<string, string | number | boolean | null>;
// Geo/Device
country: string;
city: string;
region: string;
longitude: number | null;
latitude: number | null;
os: string;
os_version: string;
browser: string;
browser_version: string;
device: string;
brand: string;
model: string;
// Metadata
imported_at: string;
sdk_name: string;
sdk_version: string;
}Sessions group related events together using a 30-minute inactivity timeout. If a user is inactive for more than 30 minutes, a new session starts.
1. Sort all events by (device_id, created_at) - chronological order per device
2. Initialize empty session tracking map
3. For each event in sorted order:
a. Look up last event for this device_id
b. Calculate time gap since last event
c. If gap > 30 minutes OR no previous event:
- Generate new session_id (UUID)
- Create session_start event (1 second before first event)
- Store session metadata (first event, counts, etc.)
d. Else (continuing existing session):
- Use existing session_id
- Update session metadata (last event, counts)
4. After processing all events:
- Create session_end event for each session (1 second after last event)
import { randomUUID } from 'crypto';
interface SessionState {
sessionId: string;
firstEvent: OpenPanelEvent;
lastEvent: OpenPanelEvent;
eventCount: number;
screenViewCount: number;
screenViews: Set<string>; // unique paths visited
revenue: number; // sum of revenue from events
}
const SESSION_TIMEOUT_MS = 30 * 60 * 1000; // 30 minutes in milliseconds
function reconstructSessions(events: OpenPanelEvent[]): {
eventsWithSessions: OpenPanelEvent[];
sessionStarts: OpenPanelEvent[];
sessionEnds: OpenPanelEvent[];
sessionAggregates: SessionAggregate[];
} {
// Step 1: Sort events by device and time
const sortedEvents = [...events].sort((a, b) => {
const deviceCompare = a.device_id.localeCompare(b.device_id);
if (deviceCompare !== 0) return deviceCompare;
return new Date(a.created_at).getTime() - new Date(b.created_at).getTime();
});
// Step 2: Track active sessions per device
const deviceSessions = new Map<string, SessionState>();
const sessionStarts: OpenPanelEvent[] = [];
const sessionEnds: OpenPanelEvent[] = [];
const eventsWithSessions: OpenPanelEvent[] = [];
// Step 3: Process each event
for (const event of sortedEvents) {
const lastSession = deviceSessions.get(event.device_id);
const eventTime = new Date(event.created_at).getTime();
let currentSession: SessionState;
if (!lastSession) {
// First event for this device - start new session
currentSession = startNewSession(event, eventTime, sessionStarts);
deviceSessions.set(event.device_id, currentSession);
} else {
// Check if we should continue existing session or start new one
const lastEventTime = new Date(lastSession.lastEvent.created_at).getTime();
const timeSinceLastEvent = eventTime - lastEventTime;
if (timeSinceLastEvent > SESSION_TIMEOUT_MS) {
// Timeout reached - close old session and start new one
sessionEnds.push(createSessionEndEvent(lastSession));
currentSession = startNewSession(event, eventTime, sessionStarts);
deviceSessions.set(event.device_id, currentSession);
} else {
// Continue existing session
currentSession = lastSession;
currentSession.lastEvent = event;
currentSession.eventCount++;
if (event.name === 'screen_view') {
currentSession.screenViewCount++;
currentSession.screenViews.add(event.path);
}
// Track revenue if present in properties
if (event.properties.revenue && typeof event.properties.revenue === 'number') {
currentSession.revenue += event.properties.revenue;
}
}
}
// Assign session_id to event
const eventWithSession = { ...event, session_id: currentSession.sessionId };
eventsWithSessions.push(eventWithSession);
}
// Step 4: Create session_end events for all remaining sessions
for (const session of deviceSessions.values()) {
sessionEnds.push(createSessionEndEvent(session));
}
// Step 5: Build session aggregates for sessions table
const sessionAggregates = Array.from(deviceSessions.values()).map(session =>
buildSessionAggregate(session)
);
return {
eventsWithSessions,
sessionStarts,
sessionEnds,
sessionAggregates,
};
}
function startNewSession(
event: OpenPanelEvent,
eventTime: number,
sessionStarts: OpenPanelEvent[]
): SessionState {
const sessionId = randomUUID();
// Create session_start event (1 second before first event)
const sessionStart: OpenPanelEvent = {
...event,
id: randomUUID(),
name: 'session_start',
session_id: sessionId,
created_at: formatClickHouseDate(new Date(eventTime - 1000)),
duration: 0,
};
sessionStarts.push(sessionStart);
const session: SessionState = {
sessionId,
firstEvent: event,
lastEvent: event,
eventCount: 1,
screenViewCount: event.name === 'screen_view' ? 1 : 0,
screenViews: new Set(event.name === 'screen_view' ? [event.path] : []),
revenue: typeof event.properties.revenue === 'number' ? event.properties.revenue : 0,
};
return session;
}
function createSessionEndEvent(session: SessionState): OpenPanelEvent {
const firstTime = new Date(session.firstEvent.created_at).getTime();
const lastTime = new Date(session.lastEvent.created_at).getTime();
const duration = lastTime - firstTime;
return {
...session.lastEvent, // Inherit most fields from last event
id: randomUUID(),
name: 'session_end',
session_id: session.sessionId,
created_at: formatClickHouseDate(new Date(lastTime + 1000)), // 1 second after
duration, // Total session duration
};
}
function formatClickHouseDate(date: Date): string {
return date.toISOString().replace('T', ' ').replace(/\.\d{3}Z$/, '');
}- 30-minute timeout: Measured from the timestamp of the last event in the session, not wall-clock time
- Session boundaries:
session_startis 1 second before first event,session_endis 1 second after last event - Event order: Must process events in chronological order per device for correct session assignment
- Duration field:
session_start: always 0session_end: time difference between first and last event in milliseconds- Regular events: usually 0 (unless you're tracking page view time)
The sessions table stores aggregated session metadata for fast querying. Track this inline during event processing:
interface SessionAggregate {
// Identity
id: string; // Same as session_id
profile_id: string;
device_id: string;
project_id: string;
// Counts
event_count: number; // Total events in session
screen_view_count: number; // Number of page views
screen_views: string[]; // Unique paths visited (as array)
// Entry/Exit
entry_path: string; // Path of first event
entry_origin: string; // Origin of first event
exit_path: string; // Path of last event
exit_origin: string; // Origin of last event
// Timing
created_at: string; // Timestamp of first event
ended_at: string; // Timestamp of last event
duration: number; // Session length in milliseconds
// Referrer (from first event)
referrer: string;
referrer_name: string;
referrer_type: string;
// Device info (from first event)
os: string;
os_version: string;
browser: string;
browser_version: string;
device: string;
brand: string;
model: string;
// Location (from first event)
country: string;
region: string;
city: string;
longitude: number | null;
latitude: number | null;
// Marketing attribution (extracted from first event's query params)
utm_source: string;
utm_medium: string;
utm_campaign: string;
utm_content: string;
utm_term: string;
// Analytics
is_bounce: boolean; // true if only 1 event in session
revenue: number; // Sum of revenue from all events
// ClickHouse-specific (use these exact values)
sign: 1; // Always 1 for imports
version: number; // Use current timestamp
properties: Record<string, string>; // Additional session properties
}function buildSessionAggregate(session: SessionState): SessionAggregate {
const firstEvent = session.firstEvent;
const lastEvent = session.lastEvent;
// Extract UTM parameters from first event's properties.__query
const query = firstEvent.properties.__query || {};
const utmSource = String(query.utm_source || '');
const utmMedium = String(query.utm_medium || '');
const utmCampaign = String(query.utm_campaign || '');
const utmContent = String(query.utm_content || '');
const utmTerm = String(query.utm_term || '');
return {
// Identity
id: session.sessionId,
profile_id: firstEvent.profile_id,
device_id: firstEvent.device_id,
project_id: firstEvent.project_id,
// Counts
event_count: session.eventCount,
screen_view_count: session.screenViewCount,
screen_views: Array.from(session.screenViews),
// Entry/Exit
entry_path: firstEvent.path,
entry_origin: firstEvent.origin,
exit_path: lastEvent.path,
exit_origin: lastEvent.origin,
// Timing
created_at: firstEvent.created_at,
ended_at: lastEvent.created_at,
duration: new Date(lastEvent.created_at).getTime() -
new Date(firstEvent.created_at).getTime(),
// Referrer (from first event)
referrer: firstEvent.referrer,
referrer_name: firstEvent.referrer_name,
referrer_type: firstEvent.referrer_type,
// Device info (from first event)
os: firstEvent.os,
os_version: firstEvent.os_version,
browser: firstEvent.browser,
browser_version: firstEvent.browser_version,
device: firstEvent.device,
brand: firstEvent.brand,
model: firstEvent.model,
// Location (from first event)
country: firstEvent.country,
region: firstEvent.region,
city: firstEvent.city,
longitude: firstEvent.longitude,
latitude: firstEvent.latitude,
// Marketing attribution
utm_source: utmSource,
utm_medium: utmMedium,
utm_campaign: utmCampaign,
utm_content: utmContent,
utm_term: utmTerm,
// Analytics
is_bounce: session.eventCount === 1,
revenue: session.revenue,
// ClickHouse metadata
sign: 1,
version: Date.now(),
properties: {},
};
}Use the official ClickHouse client to insert events and sessions in batches:
import { createClient } from '@clickhouse/client';
// Initialize ClickHouse client
const ch = createClient({
url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
// Optional: add authentication
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD,
});
async function insertEvents(events: OpenPanelEvent[]) {
const BATCH_SIZE = 5000;
for (let i = 0; i < events.length; i += BATCH_SIZE) {
const batch = events.slice(i, i + BATCH_SIZE);
await ch.insert({
table: 'events',
values: batch,
format: 'JSONEachRow',
});
console.log(`Inserted events ${i} to ${i + batch.length}`);
}
}
async function insertSessions(sessions: SessionAggregate[]) {
await ch.insert({
table: 'sessions',
values: sessions,
format: 'JSONEachRow',
});
console.log(`Inserted ${sessions.length} sessions`);
}- Events: 5,000-10,000 per batch for optimal performance
- Sessions: Can insert all at once (usually much fewer than events)
- Memory: Monitor memory usage with large imports
- Progress tracking: Log after each batch for monitoring
Here's a full working example that imports events from a CSV file:
import { createClient } from '@clickhouse/client';
import { parse } from 'csv-parse/sync';
import { readFileSync } from 'fs';
import { randomUUID } from 'crypto';
// Configuration
const PROJECT_ID = 'your-project-id-here';
const CLICKHOUSE_URL = process.env.CLICKHOUSE_URL || 'http://localhost:8123';
const CSV_FILE_PATH = './events-export.csv';
// Initialize ClickHouse
const ch = createClient({ url: CLICKHOUSE_URL });
interface RawEvent {
timestamp: string; // Your CSV timestamp column
visitor_id: string; // Your CSV visitor ID column
user_id?: string; // Your CSV user ID column (optional)
page_url: string; // Your CSV URL column
event_name: string; // Your CSV event name column
country?: string;
city?: string;
browser?: string;
os?: string;
device?: string;
// ... other columns from your CSV
}
async function main() {
console.log('Starting import...');
// Step 1: Read and parse CSV
const csvContent = readFileSync(CSV_FILE_PATH, 'utf-8');
const rawEvents: RawEvent[] = parse(csvContent, {
columns: true,
skip_empty_lines: true,
});
console.log(`Loaded ${rawEvents.length} raw events`);
// Step 2: Transform to OpenPanel format
const openPanelEvents: OpenPanelEvent[] = rawEvents.map(raw => {
const url = new URL(raw.page_url);
// Extract query parameters from URL
const query: Record<string, string> = {};
for (const [key, value] of url.searchParams.entries()) {
query[key] = value;
}
// Build properties object
const properties: Record<string, any> = {};
// Add query parameters (including UTM)
if (Object.keys(query).length > 0) {
properties.__query = query;
}
// Add page metadata if available
if (raw.page_title) properties.__title = raw.page_title;
if (raw.screen_resolution) properties.__screen = raw.screen_resolution;
if (raw.language) properties.__language = raw.language;
// Add custom properties from your data
if (raw.custom_field1) properties.custom_field1 = raw.custom_field1;
if (raw.custom_field2) properties.custom_field2 = raw.custom_field2;
return {
// Identity
id: randomUUID(),
device_id: raw.visitor_id,
profile_id: raw.user_id || raw.visitor_id, // Use user_id if available
session_id: '', // Will be assigned during session reconstruction
project_id: PROJECT_ID,
// Core
name: raw.event_name === 'pageview' ? 'screen_view' : raw.event_name,
created_at: formatClickHouseDate(new Date(raw.timestamp)),
path: url.pathname,
origin: url.origin,
referrer: '', // Extract from your data if available
referrer_name: '',
referrer_type: 'direct',
duration: 0,
properties, // Properties with query params and metadata
// Geo/Device
country: raw.country || '',
city: raw.city || '',
region: '',
longitude: null,
latitude: null,
os: raw.os || '',
os_version: '',
browser: raw.browser || '',
browser_version: '',
device: raw.device || 'desktop',
brand: '',
model: '',
// Metadata
imported_at: new Date().toISOString(),
sdk_name: 'manual-import',
sdk_version: '1.0.0',
};
});
console.log('Transformed events to OpenPanel format');
// Step 3: Reconstruct sessions
const {
eventsWithSessions,
sessionStarts,
sessionEnds,
sessionAggregates,
} = reconstructSessions(openPanelEvents);
console.log(`Created ${sessionAggregates.length} sessions`);
console.log(`Generated ${sessionStarts.length} session_start events`);
console.log(`Generated ${sessionEnds.length} session_end events`);
// Step 4: Combine all events
const allEvents = [
...sessionStarts,
...eventsWithSessions,
...sessionEnds,
];
console.log(`Total events to insert: ${allEvents.length}`);
// Step 5: Insert into ClickHouse
await insertEvents(allEvents);
await insertSessions(sessionAggregates);
console.log('Import complete!');
}
function formatClickHouseDate(date: Date): string {
return date.toISOString().replace('T', ' ').replace(/\.\d{3}Z$/, '');
}
async function insertEvents(events: OpenPanelEvent[]) {
const BATCH_SIZE = 5000;
for (let i = 0; i < events.length; i += BATCH_SIZE) {
const batch = events.slice(i, i + BATCH_SIZE);
await ch.insert({
table: 'events',
values: batch,
format: 'JSONEachRow',
});
console.log(`β Inserted events ${i + 1} to ${Math.min(i + BATCH_SIZE, events.length)}`);
}
}
async function insertSessions(sessions: SessionAggregate[]) {
if (sessions.length === 0) return;
await ch.insert({
table: 'sessions',
values: sessions,
format: 'JSONEachRow',
});
console.log(`β Inserted ${sessions.length} sessions`);
}
// Run the import
main().catch(console.error);- Install dependencies:
npm install @clickhouse/client csv-parse- Set environment variables:
export CLICKHOUSE_URL="http://your-clickhouse-host:8123"-
Update the configuration in the script:
- Set
PROJECT_IDto your OpenPanel project ID - Set
CSV_FILE_PATHto your data file - Adjust the
RawEventinterface to match your CSV columns - Update the transformation logic in the map function
- Set
-
Run the script:
npx tsx import-script.tsThe __query object is an exception to the flat structure rule:
// β
Correct: __query can contain nested objects
properties: {
"__query": {
"utm_source": "google", // Nested is OK here
"utm_medium": "cpc"
},
"custom_field": "value" // Other fields stay flat
}
// β Wrong: Don't flatten __query into dot notation
properties: {
"__query.utm_source": "google", // DON'T do this
"__query.utm_medium": "cpc"
}Always extract and include UTM parameters for proper attribution tracking. OpenPanel's dashboard relies on these for traffic source analysis.
ClickHouse expects dates in a specific format:
// β
Correct: YYYY-MM-DD HH:MM:SS
"2024-01-15 14:30:00"
// β Wrong: ISO 8601 with T and Z
"2024-01-15T14:30:00.000Z"
// Conversion function
function formatClickHouseDate(date: Date): string {
return date.toISOString()
.replace('T', ' ') // Remove T separator
.replace(/\.\d{3}Z$/, ''); // Remove milliseconds and Z
}- Optimal batch size: 5,000-10,000 events
- Too small: More network overhead, slower imports
- Too large: Memory issues, risk of timeout
- Monitor: Watch memory usage and insert time per batch
Exactly 30 minutes gap:
// If gap === 30 minutes exactly, continue existing session
if (timeSinceLastEvent > SESSION_TIMEOUT_MS) {
// Start new session
}Events at exact same timestamp:
- They belong to the same session
- Order doesn't matter as they're simultaneous
Out-of-order events:
- Always sort by
(device_id, created_at)before processing - Do NOT assume your source data is already sorted
If you're tracking identified users, you may want to create profile records:
// Optional: Upsert profiles in PostgreSQL
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function upsertProfiles(events: OpenPanelEvent[]) {
const uniqueProfiles = new Map<string, OpenPanelEvent>();
for (const event of events) {
// Only track identified users (where profile_id !== device_id)
if (event.profile_id !== event.device_id) {
if (!uniqueProfiles.has(event.profile_id)) {
uniqueProfiles.set(event.profile_id, event);
}
}
}
for (const [profileId, event] of uniqueProfiles) {
await prisma.profile.upsert({
where: {
projectId_id: {
projectId: event.project_id,
id: profileId,
},
},
create: {
id: profileId,
projectId: event.project_id,
firstName: '', // Extract from your data if available
lastName: '',
email: '',
},
update: {}, // Don't overwrite existing data
});
}
}OpenPanel uses a specific priority order for determining traffic sources:
Priority Order (highest to lowest):
- UTM parameters (
utm_source,ref,utm_referrerin query string) - Organic referrer (HTTP referrer header, if not same domain)
- Direct (no referrer information)
// Step 1: Check if referrer is from same domain (internal navigation)
function isSameDomain(referrerUrl: string | undefined, currentUrl: string | undefined): boolean {
if (!referrerUrl || !currentUrl) return false;
try {
const referrerHost = new URL(referrerUrl).hostname;
const currentHost = new URL(currentUrl).hostname;
return referrerHost === currentHost;
} catch {
return false;
}
}
// Step 2: Parse organic referrer (if not same domain)
function parseReferrer(url: string | undefined): {
name: string;
type: string;
url: string;
} | null {
if (!url) return null;
try {
const hostname = new URL(url).hostname;
// Check against known referrer database (simplified)
// In production, use a comprehensive list like apps/worker/src/referrers.ts
const knownReferrers: Record<string, { name: string; type: string }> = {
'google.com': { name: 'Google', type: 'search' },
'bing.com': { name: 'Bing', type: 'search' },
'yahoo.com': { name: 'Yahoo', type: 'search' },
'duckduckgo.com': { name: 'DuckDuckGo', type: 'search' },
'facebook.com': { name: 'Facebook', type: 'social' },
'twitter.com': { name: 'Twitter', type: 'social' },
'linkedin.com': { name: 'LinkedIn', type: 'social' },
'reddit.com': { name: 'Reddit', type: 'social' },
't.co': { name: 'Twitter', type: 'social' },
};
const match = knownReferrers[hostname] || knownReferrers[hostname.replace('www.', '')];
return {
name: match?.name || '',
type: match?.type || 'unknown',
url: url.replace(/\/$/, ''), // Strip trailing slash
};
} catch {
return null;
}
}
// Step 3: Check for UTM referrer (HIGHEST PRIORITY)
function getReferrerFromQuery(query: Record<string, string> | undefined): {
name: string;
type: string;
url: string;
} | null {
if (!query) return null;
// Check multiple query param variants
const source = query.utm_source || query.ref || query.utm_referrer || '';
if (!source) return null;
// Try to match against known referrers
const knownReferrers: Record<string, { name: string; type: string }> = {
google: { name: 'Google', type: 'search' },
bing: { name: 'Bing', type: 'search' },
facebook: { name: 'Facebook', type: 'social' },
twitter: { name: 'Twitter', type: 'social' },
linkedin: { name: 'LinkedIn', type: 'social' },
};
const match = knownReferrers[source.toLowerCase()];
return {
name: match?.name || source, // Use source as name if not recognized
type: match?.type || 'unknown',
url: '', // UTM referrers don't have a URL
};
}
// Step 4: Apply referrer logic with correct priority
function determineReferrer(
organicReferrer: string | undefined,
currentUrl: string | undefined,
query: Record<string, string> | undefined
): {
referrer: string;
referrer_name: string;
referrer_type: string;
} {
// Priority 1: UTM parameters (highest priority)
const utmReferrer = getReferrerFromQuery(query);
if (utmReferrer) {
return {
referrer: utmReferrer.url,
referrer_name: utmReferrer.name,
referrer_type: utmReferrer.type,
};
}
// Priority 2: Organic referrer (if not same domain)
if (organicReferrer && !isSameDomain(organicReferrer, currentUrl)) {
const parsed = parseReferrer(organicReferrer);
if (parsed) {
return {
referrer: parsed.url,
referrer_name: parsed.name,
referrer_type: parsed.type,
};
}
}
// Priority 3: Direct traffic (no referrer)
return {
referrer: '',
referrer_name: '',
referrer_type: '',
};
}Important notes:
- UTM parameters always override organic referrers
- Same-domain referrers are treated as direct traffic (internal navigation)
- The
referrerfield stores the URL,referrer_namestores human-readable name - Use OpenPanel's referrer database for accurate classification (see
apps/worker/src/referrers.ts) - Support multiple UTM variants:
utm_source,ref,utm_referrer
Some platforms prefix region codes with country codes:
function normalizeRegion(region: string): string {
// Remove "US-" or "GB-" prefix if present
return region.replace(/^[A-Z]{2}-/, '');
}
// Example:
// "US-CA" β "CA"
// "GB-ENG" β "ENG"
// "CA" β "CA" (unchanged)After importing, verify your data is correct:
-- Total events imported
SELECT count(*) as total_events
FROM events
WHERE project_id = 'your-project-id'
AND imported_at IS NOT NULL;
-- Events by type
SELECT name, count(*) as count
FROM events
WHERE project_id = 'your-project-id'
AND imported_at IS NOT NULL
GROUP BY name
ORDER BY count DESC;-- Total sessions
SELECT count(*) as total_sessions
FROM sessions
WHERE project_id = 'your-project-id';
-- Session duration distribution
SELECT
round(duration / 1000) as duration_seconds,
count(*) as count
FROM sessions
WHERE project_id = 'your-project-id'
GROUP BY duration_seconds
ORDER BY duration_seconds;
-- Sessions with events
SELECT
s.id,
s.event_count,
count(e.id) as actual_events
FROM sessions s
LEFT JOIN events e ON e.session_id = s.id
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.event_count
HAVING actual_events != s.event_count
LIMIT 10;-- Events without session_id (should be 0)
SELECT count(*) as orphaned_events
FROM events
WHERE project_id = 'your-project-id'
AND session_id = ''
AND name NOT IN ('session_start', 'session_end');
-- Events with session_id but no matching session
SELECT count(*) as events_without_session
FROM events e
LEFT JOIN sessions s ON e.session_id = s.id
WHERE e.project_id = 'your-project-id'
AND e.session_id != ''
AND s.id IS NULL;-- Check session_start timing (should be 1 second before first event)
SELECT
s.id,
s.created_at as session_start,
min(e.created_at) as first_event,
dateDiff('second', s.created_at, min(e.created_at)) as diff_seconds
FROM sessions s
JOIN events e ON e.session_id = s.id AND e.name != 'session_start'
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.created_at
HAVING diff_seconds != 1
LIMIT 10;
-- Check session_end timing (should be 1 second after last event)
SELECT
s.id,
s.ended_at as session_end,
max(e.created_at) as last_event,
dateDiff('second', max(e.created_at), s.ended_at) as diff_seconds
FROM sessions s
JOIN events e ON e.session_id = s.id AND e.name != 'session_end'
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.ended_at
HAVING diff_seconds != 1
LIMIT 10;-- Look at a few complete sessions
SELECT
e.session_id,
e.name,
e.created_at,
e.path
FROM events e
WHERE e.project_id = 'your-project-id'
AND e.session_id IN (
SELECT id FROM sessions
WHERE project_id = 'your-project-id'
LIMIT 5
)
ORDER BY e.session_id, e.created_at;Issue: "Events have no session_id"
- Cause: Session reconstruction didn't run or failed
- Fix: Ensure events are sorted by (device_id, created_at) before processing
Issue: "Too many sessions (one per event)"
- Cause: Session timeout logic not working
- Fix: Verify time comparison uses milliseconds, not seconds
- Fix: Ensure
device_idis consistent for same user
Issue: "Session duration is negative"
- Cause: Events out of chronological order
- Fix: Sort events before processing:
.sort((a, b) => new Date(a.created_at) - new Date(b.created_at))
Issue: "ClickHouse insert fails with date format error"
- Cause: Using ISO 8601 format with T and Z
- Fix: Use
YYYY-MM-DD HH:MM:SSformat (no T, no Z, no milliseconds)
Issue: "Memory error with large imports"
- Cause: Loading all events into memory at once
- Fix: Process events in batches of 10,000-50,000
- Fix: Use streaming CSV parser instead of loading entire file
Once you've successfully imported your data:
- Verify in OpenPanel Dashboard: Check that events and sessions appear correctly
- Set up retention: Configure data retention policies if needed
- Monitor performance: Watch query performance with imported data
- Document your mapping: Keep notes on how you mapped fields for future imports
- Automate if needed: Schedule regular imports if you have ongoing data
- OpenPanel Documentation
- ClickHouse Client Documentation
- Session Reconstruction in OpenPanel - See
packages/db/src/services/import.service.ts
If you encounter issues with manual import:
- Check the validation queries above to identify the problem
- Review your transformation logic for correctness
- Verify your ClickHouse connection and permissions
- Open an issue on GitHub with:
- Sample of your source data (anonymized)
- Your transformation code
- Error messages or unexpected results
Happy importing! π