Skip to content

Instantly share code, notes, and snippets.

@pedramamini
Last active June 14, 2025 18:49
Show Gist options
  • Select an option

  • Save pedramamini/a5a717c79932cddb2ebf0b06f979e85e to your computer and use it in GitHub Desktop.

Select an option

Save pedramamini/a5a717c79932cddb2ebf0b06f979e85e to your computer and use it in GitHub Desktop.
JSON Structure Analyzer and Reducer
#!/usr/bin/env python3
"""
JSON Structure Analyzer and Reducer
===================================
A powerful command-line tool for analyzing, reducing, and exploring JSON file structures.
Designed to help developers understand large JSON datasets by providing multiple modes
of interaction: structure reduction for LLM analysis, hierarchical size analysis, and
interactive terminal-based exploration.
WHAT THIS PROGRAM DOES
======================
This tool addresses the common problem of working with large JSON files that exceed
context limits for Large Language Models (LLMs) or are simply too large to understand
at a glance. It provides three main modes of operation:
1. **Structure Reduction Mode (Default)**
- Reduces JSON files while preserving structure and examples
- Truncates long strings to manageable lengths (1000 chars)
- Reduces arrays and array-like dictionaries to N representative items
- Detects and handles GUID-based dictionaries intelligently
- Perfect for feeding to LLMs while maintaining structural understanding
2. **Size Analysis Mode (-a/--analyzer)**
- Provides hierarchical analysis of JSON structure with byte sizes
- Shows size distribution at specified depth levels
- Helps identify the largest data sections quickly
- Displays results in a tree format with human-readable sizes
3. **Interactive TUI Mode (-t/--tui)**
- Terminal User Interface for real-time JSON exploration
- Split-panel design: tree navigation + value display
- Keyboard-driven navigation with expand/collapse functionality
- Syntax highlighting and smart size formatting
- Perfect for understanding complex nested structures
KEY FEATURES
============
• **Smart Dictionary Detection**: Distinguishes between semantic dictionaries and
array-like dictionaries (sequential keys) or GUID-based collections
• **Configurable Depth Control**: Control how many items to keep in arrays/dicts
• **Memory Efficient**: Lazy loading and smart truncation strategies
• **Multiple Output Formats**: JSON, tabular analysis, or interactive exploration
• **Size-Aware Processing**: All operations consider actual byte sizes
• **Cross-Platform**: Works on any system with Python 3.6+ and terminal support
USAGE EXAMPLES
==============
# Basic structure reduction (keep 1 item per array/dict)
json-structure large-data.json
# Keep more examples for better structure understanding
json-structure -d 5 large-data.json
# Analyze size distribution at different depths
json-structure -a 1 large-data.json # Top-level only
json-structure -a 3 large-data.json # 3 levels deep
# Interactive exploration
json-structure -t large-data.json
# Combined usage for comprehensive analysis
json-structure -d 3 -a 2 api-response.json
CODE ORGANIZATION
=================
Function Hierarchy and Edit Locations:
**CORE DATA PROCESSING**
├── truncate_string() → Modify string truncation behavior
├── get_json_size() → Change size calculation method
├── format_size() / format_size_kb() → Adjust size display formats
└── reduce_json_structure() → **Main reduction logic - edit for new reduction rules**
**SMART DETECTION SYSTEM**
├── is_array_like_dict() → Modify sequential key detection (0,1,2...)
└── is_guid_like_dict() → **Edit GUID/UUID pattern recognition**
**SIZE ANALYSIS MODE**
├── analyze_json_structure() → Change depth-based analysis logic
└── print_analysis() → Modify tabular output format
**INTERACTIVE TUI MODE**
├── TreeNode class → Edit node behavior and data structure
│ ├── build_children() → Modify lazy loading logic
│ ├── toggle_expand() → Change expand/collapse behavior
│ └── get_display_name() → Adjust node naming
├── JSONTreeTUI class → **Main TUI interface - edit for UI changes**
│ ├── init_colors() → Modify color scheme
│ ├── draw_tree() → Change left panel tree rendering
│ ├── draw_value_panel() → **Edit right panel value display**
│ ├── format_value_for_display() → Modify value formatting logic
│ └── handle_input() → **Add new keyboard shortcuts**
└── run_tui() → TUI initialization and error handling
**APPLICATION ENTRY POINT**
└── main() → **Edit for new command-line options**
├── Argument parsing → Add new CLI flags
├── Mode selection → Route to different processing modes
└── Error handling → Modify error messages and exit codes
"""
import json
import sys
import argparse
import curses
import math
from typing import Dict, List, Any, Tuple, Optional
def truncate_string(value, max_length=1000):
"""Truncate strings to keep them manageable while preserving content examples"""
if isinstance(value, str) and len(value) > max_length:
return value[:max_length] + "... (truncated)"
return value
def is_array_like_dict(data):
"""
Detect if a dictionary is actually an array in disguise.
Returns True if keys are sequential integers starting from 0 or 1.
"""
if not isinstance(data, dict) or len(data) < 3: # Need at least 3 items to be worth reducing
return False
keys = list(data.keys())
# Check if all keys are string representations of integers
try:
int_keys = [int(k) for k in keys]
except (ValueError, TypeError):
return False
# Sort the integer keys
int_keys.sort()
# Check if they form a sequence starting from 0 or 1
if int_keys == list(range(len(int_keys))): # 0, 1, 2, 3...
return True
elif int_keys == list(range(1, len(int_keys) + 1)): # 1, 2, 3, 4...
return True
return False
def is_guid_like_dict(data):
"""
Detect if a dictionary has GUID-like keys (UUIDs, hashes, etc.).
Returns True if keys appear to be auto-generated identifiers.
"""
if not isinstance(data, dict) or len(data) < 3:
return False
keys = list(data.keys())
# Check for UUID pattern (8-4-4-4-12 hex digits with dashes)
uuid_pattern_count = 0
hash_like_count = 0
for key in keys:
if not isinstance(key, str):
continue
# UUID pattern: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
if len(key) == 36 and key.count('-') == 4:
parts = key.split('-')
if (len(parts) == 5 and
len(parts[0]) == 8 and len(parts[1]) == 4 and
len(parts[2]) == 4 and len(parts[3]) == 4 and len(parts[4]) == 12):
try:
# Check if all parts are hex
for part in parts:
int(part, 16)
uuid_pattern_count += 1
continue
except ValueError:
pass
# Hash-like pattern: long hex strings (20+ chars, all hex)
if len(key) >= 20 and len(key) <= 64:
try:
int(key, 16)
hash_like_count += 1
continue
except ValueError:
pass
# If most keys match GUID/hash patterns, consider it GUID-like
total_keys = len(keys)
guid_ratio = (uuid_pattern_count + hash_like_count) / total_keys
return guid_ratio >= 0.8 # 80% of keys are GUID-like
def reduce_json_structure(data, max_items=1):
"""
Reduce JSON structure by:
1. Truncating long strings to 1000 chars
2. Reducing arrays to max_items representative elements
3. Detecting array-like and GUID-like dictionaries and reducing them too
4. Preserving all other object keys and structure
Args:
data: The JSON data to reduce
max_items: Maximum number of items to keep in arrays and array-like dicts
"""
if isinstance(data, dict):
# Check if this dictionary is actually an array in disguise or has GUID-like keys
if is_array_like_dict(data):
# Treat it like an array - keep first max_items entries
keys = sorted(data.keys(), key=int)
reduced_data = {}
# Keep first max_items entries
for i, key in enumerate(keys[:max_items]):
reduced_data[key] = reduce_json_structure(data[key], max_items)
# Add indicator if we truncated
if len(keys) > max_items:
reduced_data["..."] = f"(array-like dict had {len(keys)} total entries)"
return reduced_data
elif is_guid_like_dict(data):
# GUID-like dictionary - keep first max_items entries
keys = list(data.keys())
reduced_data = {}
# Keep first max_items entries
for i, key in enumerate(keys[:max_items]):
reduced_data[key] = reduce_json_structure(data[key], max_items)
# Add indicator if we truncated
if len(keys) > max_items:
reduced_data["..."] = f"(GUID-like dict had {len(keys)} total entries)"
return reduced_data
else:
# Regular dictionary - keep all keys
reduced_data = {}
for key, value in data.items():
reduced_data[key] = reduce_json_structure(value, max_items)
return reduced_data
elif isinstance(data, list):
if not data:
return []
# Keep first max_items elements
reduced_items = []
for i in range(min(len(data), max_items)):
reduced_items.append(reduce_json_structure(data[i], max_items))
# Add metadata about the original array size if we truncated
if len(data) > max_items:
reduced_items.append(f"... (array had {len(data)} total items)")
return reduced_items
else:
# Apply string truncation to leaf values
return truncate_string(data)
def get_json_size(data):
"""Get the byte size of JSON data when serialized"""
return len(json.dumps(data, separators=(',', ':')).encode('utf-8'))
def format_size(size_bytes):
"""Format size in human-readable format"""
if size_bytes < 1024:
return f"{size_bytes}B"
elif size_bytes < 1024 * 1024:
return f"{size_bytes/1024:.1f}KB"
else:
return f"{size_bytes/(1024*1024):.1f}MB"
def format_size_kb(size_bytes):
"""Format size in KB with appropriate decimal precision for TUI"""
kb = size_bytes / 1024.0
if kb < 0.1:
return f"{kb:.3f}KB"
elif kb < 1.0:
return f"{kb:.2f}KB"
elif kb < 10.0:
return f"{kb:.1f}KB"
else:
return f"{kb:.0f}KB"
class TreeNode:
"""Represents a node in the JSON tree structure"""
def __init__(self, key: str, value: Any, path: str = "", parent: Optional['TreeNode'] = None):
self.key = key
self.value = value
self.path = path
self.parent = parent
self.children: List['TreeNode'] = []
self.expanded = False
self.size = get_json_size(value)
self.data_type = self._get_data_type()
self.count = self._get_count()
def _get_data_type(self) -> str:
"""Get the data type string for display"""
if isinstance(self.value, dict):
return "dict"
elif isinstance(self.value, list):
return "array"
elif isinstance(self.value, str):
return "string"
elif isinstance(self.value, (int, float)):
return "number"
elif isinstance(self.value, bool):
return "boolean"
elif self.value is None:
return "null"
else:
return "unknown"
def _get_count(self) -> int:
"""Get count for display (length for containers, char count for strings)"""
if isinstance(self.value, (dict, list)):
return len(self.value)
elif isinstance(self.value, str):
return len(self.value)
else:
return 1
def build_children(self):
"""Build child nodes if not already built"""
if self.children or not isinstance(self.value, (dict, list)):
return
if isinstance(self.value, dict):
for key, val in self.value.items():
child_path = f"{self.path}.{key}" if self.path else key
child = TreeNode(str(key), val, child_path, self)
self.children.append(child)
elif isinstance(self.value, list):
for i, val in enumerate(self.value):
child_path = f"{self.path}[{i}]"
child = TreeNode(f"[{i}]", val, child_path, self)
self.children.append(child)
def toggle_expand(self):
"""Toggle expansion state"""
if isinstance(self.value, (dict, list)) and len(self.value) > 0:
self.expanded = not self.expanded
if self.expanded:
self.build_children()
def get_display_name(self) -> str:
"""Get the display name for this node"""
if self.key == "<root>":
return "<root>"
return self.key
def is_expandable(self) -> bool:
"""Check if this node can be expanded"""
return isinstance(self.value, (dict, list)) and len(self.value) > 0
class JSONTreeTUI:
"""Terminal User Interface for JSON tree visualization"""
def __init__(self, data: Any, filename: str):
self.root = TreeNode("<root>", data)
self.filename = filename
self.current_node = 0
self.scroll_offset = 0
self.visible_nodes: List[TreeNode] = []
self.total_size = get_json_size(data)
# Color pairs
self.COLOR_NORMAL = 1
self.COLOR_SELECTED = 2
self.COLOR_DICT = 3
self.COLOR_ARRAY = 4
self.COLOR_STRING = 5
self.COLOR_NUMBER = 6
self.COLOR_BOOLEAN = 7
self.COLOR_NULL = 8
self.COLOR_SIZE = 9
self.COLOR_STATUS = 10
def init_colors(self):
"""Initialize color pairs"""
curses.start_color()
curses.use_default_colors()
curses.init_pair(self.COLOR_NORMAL, curses.COLOR_WHITE, -1)
curses.init_pair(self.COLOR_SELECTED, curses.COLOR_BLACK, curses.COLOR_WHITE)
curses.init_pair(self.COLOR_DICT, curses.COLOR_CYAN, -1)
curses.init_pair(self.COLOR_ARRAY, curses.COLOR_YELLOW, -1)
curses.init_pair(self.COLOR_STRING, curses.COLOR_GREEN, -1)
curses.init_pair(self.COLOR_NUMBER, curses.COLOR_BLUE, -1)
curses.init_pair(self.COLOR_BOOLEAN, curses.COLOR_MAGENTA, -1)
curses.init_pair(self.COLOR_NULL, curses.COLOR_RED, -1)
curses.init_pair(self.COLOR_SIZE, curses.COLOR_WHITE, -1)
curses.init_pair(self.COLOR_STATUS, curses.COLOR_BLACK, curses.COLOR_CYAN)
def get_type_color(self, data_type: str) -> int:
"""Get color pair for data type"""
color_map = {
"dict": self.COLOR_DICT,
"array": self.COLOR_ARRAY,
"string": self.COLOR_STRING,
"number": self.COLOR_NUMBER,
"boolean": self.COLOR_BOOLEAN,
"null": self.COLOR_NULL,
}
return color_map.get(data_type, self.COLOR_NORMAL)
def collect_visible_nodes(self, node: TreeNode, depth: int = 0) -> List[Tuple[TreeNode, int]]:
"""Collect all visible nodes with their depth"""
result = [(node, depth)]
if node.expanded:
for child in node.children:
result.extend(self.collect_visible_nodes(child, depth + 1))
return result
def get_tree_prefix(self, depth: int, is_last: bool, parent_prefixes: List[bool]) -> str:
"""Generate tree prefix with ASCII characters"""
if depth == 0:
return ""
prefix = ""
for i in range(depth - 1):
if parent_prefixes[i]:
prefix += "│ "
else:
prefix += " "
if is_last:
prefix += "└── "
else:
prefix += "├── "
return prefix
def draw_tree(self, stdscr):
"""Draw the tree structure with split panel layout"""
height, width = stdscr.getmaxyx()
tree_height = height - 2 # Reserve space for status bar
# Calculate panel widths (60% tree, 40% value panel)
tree_width = int(width * 0.6)
value_width = width - tree_width - 1 # -1 for separator
# Collect visible nodes
visible_with_depth = self.collect_visible_nodes(self.root)
self.visible_nodes = [node for node, _ in visible_with_depth]
# Calculate scroll bounds
if self.current_node >= len(self.visible_nodes):
self.current_node = len(self.visible_nodes) - 1
if self.current_node < 0:
self.current_node = 0
# Adjust scroll offset
if self.current_node < self.scroll_offset:
self.scroll_offset = self.current_node
elif self.current_node >= self.scroll_offset + tree_height:
self.scroll_offset = self.current_node - tree_height + 1
# Clear screen
stdscr.clear()
# Draw tree panel
for i in range(tree_height):
node_index = self.scroll_offset + i
if node_index >= len(visible_with_depth):
break
node, depth = visible_with_depth[node_index]
# Determine if this is the last child at its level
is_last = True
if node.parent:
siblings = node.parent.children
is_last = siblings[-1] == node
# Build parent prefixes for tree drawing
parent_prefixes = []
current = node.parent
while current and current.parent: # Skip root
if current.parent.children[-1] != current:
parent_prefixes.insert(0, True)
else:
parent_prefixes.insert(0, False)
current = current.parent
# Generate tree prefix
tree_prefix = self.get_tree_prefix(depth, is_last, parent_prefixes)
# Expansion indicator
if node.is_expandable():
expand_char = "▼" if node.expanded else "▶"
else:
expand_char = " "
# Node display name
display_name = node.get_display_name()
# Size information
size_str = format_size_kb(node.size)
# Type and count info
type_info = f"({node.data_type}"
if node.count > 1 or node.data_type in ["dict", "array"]:
type_info += f", {node.count}"
type_info += ")"
# Build display line
line = f"{tree_prefix}{expand_char} {display_name} {size_str} {type_info}"
# Truncate to fit tree panel
if len(line) > tree_width - 1:
line = line[:tree_width - 4] + "..."
# Determine colors
if node_index == self.current_node:
attr = curses.color_pair(self.COLOR_SELECTED) | curses.A_BOLD
else:
attr = curses.color_pair(self.get_type_color(node.data_type))
try:
stdscr.addstr(i, 0, line, attr)
except curses.error:
pass # Ignore errors from writing to edge of screen
# Draw vertical separator
for i in range(tree_height):
try:
stdscr.addstr(i, tree_width, "│", curses.color_pair(self.COLOR_NORMAL))
except curses.error:
pass
# Draw value panel
self.draw_value_panel(stdscr, tree_width + 1, value_width, tree_height)
# Draw status bar
self.draw_status_bar(stdscr)
stdscr.refresh()
def draw_value_panel(self, stdscr, start_x, panel_width, panel_height):
"""Draw the value panel showing the current node's content"""
if self.current_node >= len(self.visible_nodes):
return
current_node = self.visible_nodes[self.current_node]
# Panel header
header = f" Value: {current_node.get_display_name()} "
if len(header) > panel_width:
header = header[:panel_width - 3] + "..."
try:
stdscr.addstr(0, start_x, header,
curses.color_pair(self.COLOR_STATUS) | curses.A_BOLD)
except curses.error:
pass
# Format the value for display
value_lines = self.format_value_for_display(current_node.value, panel_width - 2)
# Display value lines
for i, line in enumerate(value_lines[:panel_height - 2]): # Leave space for header
display_line = line[:panel_width - 1] # Ensure it fits
# Determine color based on content
attr = curses.color_pair(self.COLOR_NORMAL)
if line.strip().startswith('"') and line.strip().endswith('"'):
attr = curses.color_pair(self.COLOR_STRING)
elif line.strip() in ['true', 'false']:
attr = curses.color_pair(self.COLOR_BOOLEAN)
elif line.strip() == 'null':
attr = curses.color_pair(self.COLOR_NULL)
elif line.strip().replace('.', '').replace('-', '').isdigit():
attr = curses.color_pair(self.COLOR_NUMBER)
elif line.strip().startswith('{') or line.strip().startswith('['):
attr = curses.color_pair(self.COLOR_DICT if line.strip().startswith('{') else self.COLOR_ARRAY)
try:
stdscr.addstr(i + 1, start_x + 1, display_line, attr)
except curses.error:
pass
def format_value_for_display(self, value, max_width):
"""Format a value for display in the value panel"""
if isinstance(value, (dict, list)):
# For containers, show formatted JSON
try:
json_str = json.dumps(value, indent=2, ensure_ascii=False)
lines = json_str.split('\n')
# Truncate very long output
if len(lines) > 50:
lines = lines[:47] + ['...', f'({len(lines) - 47} more lines)']
# Wrap long lines
wrapped_lines = []
for line in lines:
if len(line) <= max_width:
wrapped_lines.append(line)
else:
# Simple word wrapping for JSON
while len(line) > max_width:
wrapped_lines.append(line[:max_width])
line = ' ' + line[max_width:] # Indent continuation
if line.strip():
wrapped_lines.append(line)
return wrapped_lines
except Exception:
return [f"<{type(value).__name__} with {len(value)} items>"]
elif isinstance(value, str):
# For strings, show with quotes and handle long strings
if len(value) > 1000:
display_value = value[:1000] + "... (truncated)"
else:
display_value = value
# Add quotes and wrap lines
quoted_value = json.dumps(display_value, ensure_ascii=False)
lines = []
current_line = ""
for char in quoted_value:
if len(current_line) >= max_width - 1:
lines.append(current_line)
current_line = ""
current_line += char
if current_line:
lines.append(current_line)
return lines
else:
# For primitives, show as JSON
json_str = json.dumps(value, ensure_ascii=False)
if len(json_str) <= max_width:
return [json_str]
else:
# Split long primitive values
lines = []
while len(json_str) > max_width:
lines.append(json_str[:max_width])
json_str = json_str[max_width:]
if json_str:
lines.append(json_str)
return lines
def draw_status_bar(self, stdscr):
"""Draw the status bar at the bottom"""
height, width = stdscr.getmaxyx()
status_y = height - 1
# Current path
current_path = ""
if self.current_node < len(self.visible_nodes):
current_path = self.visible_nodes[self.current_node].path or "<root>"
# Status information
total_size_str = format_size_kb(self.total_size)
status_left = f"Path: {current_path}"
status_right = f"Size: {total_size_str} | ↑↓:Navigate Enter:Expand/Collapse q:Quit"
# Truncate left side if needed
available_width = width - len(status_right) - 3
if len(status_left) > available_width:
status_left = status_left[:available_width - 3] + "..."
# Create full status line
padding = width - len(status_left) - len(status_right)
status_line = status_left + " " * padding + status_right
try:
stdscr.addstr(status_y, 0, status_line[:width],
curses.color_pair(self.COLOR_STATUS) | curses.A_BOLD)
except curses.error:
pass
def handle_input(self, key: int) -> bool:
"""Handle keyboard input. Returns False to quit."""
if key == ord('q') or key == ord('Q'):
return False
elif key == curses.KEY_UP:
if self.current_node > 0:
self.current_node -= 1
elif key == curses.KEY_DOWN:
if self.current_node < len(self.visible_nodes) - 1:
self.current_node += 1
elif key == curses.KEY_ENTER or key == 10 or key == 13:
if self.current_node < len(self.visible_nodes):
self.visible_nodes[self.current_node].toggle_expand()
elif key == curses.KEY_LEFT:
# Collapse current node or go to parent
if self.current_node < len(self.visible_nodes):
node = self.visible_nodes[self.current_node]
if node.expanded:
node.expanded = False
elif node.parent and node.parent != self.root:
# Find parent in visible nodes
for i, visible_node in enumerate(self.visible_nodes):
if visible_node == node.parent:
self.current_node = i
break
elif key == curses.KEY_RIGHT:
# Expand current node
if self.current_node < len(self.visible_nodes):
node = self.visible_nodes[self.current_node]
if node.is_expandable() and not node.expanded:
node.toggle_expand()
return True
def run(self, stdscr):
"""Main TUI loop"""
self.init_colors()
curses.curs_set(0) # Hide cursor
stdscr.keypad(True) # Enable special keys
while True:
self.draw_tree(stdscr)
key = stdscr.getch()
if not self.handle_input(key):
break
def run_tui(data: Any, filename: str):
"""Run the TUI application"""
tui = JSONTreeTUI(data, filename)
curses.wrapper(tui.run)
def analyze_json_structure(data, path="", current_depth=0, target_depth=1):
"""
Analyze JSON structure up to specified depth and return size information.
Returns a list of (path, size, type, count, depth) tuples.
"""
results = []
current_size = get_json_size(data)
if isinstance(data, dict):
# Add info about this dictionary
results.append((path, current_size, "dict", len(data), current_depth))
# Only recurse if we haven't reached target depth
if current_depth < target_depth:
for key, value in data.items():
key_path = f"{path}.{key}" if path else key
results.extend(analyze_json_structure(value, key_path, current_depth + 1, target_depth))
elif isinstance(data, list):
# Add info about this array
results.append((path, current_size, "array", len(data), current_depth))
# Only recurse if we haven't reached target depth
if current_depth < target_depth:
# Analyze first few items to show structure variety
for i, item in enumerate(data[:3]): # Only analyze first 3 items
item_path = f"{path}[{i}]"
results.extend(analyze_json_structure(item, item_path, current_depth + 1, target_depth))
if len(data) > 3:
# Add summary for remaining items at this depth level
remaining_size = sum(get_json_size(item) for item in data[3:])
results.append((f"{path}[3...{len(data)-1}]", remaining_size, "array_tail", len(data) - 3, current_depth + 1))
else:
# Leaf value - only show if we're at or below target depth
if current_depth <= target_depth:
value_type = type(data).__name__
if isinstance(data, str):
count = len(data) # String length
else:
count = 1
results.append((path, current_size, value_type, count, current_depth))
return results
def print_analysis(results, target_depth):
"""Print the analysis results in a tree-like format"""
# Sort by depth first, then by size within each depth level
results.sort(key=lambda x: (x[4], -x[1])) # x[4] is depth, x[1] is size
print(f"JSON Structure Analysis (depth: {target_depth})")
print("=" * 70)
print(f"{'Path':<45} {'Size':<10} {'Type':<12} {'Count'}")
print("-" * 70)
for path, size, data_type, count, depth in results:
# Add indentation based on actual depth
indent = " " * depth
# Get the display name (last part of path)
if path == "":
display_name = "<root>"
elif '[' in path and ']' in path:
# Array index notation
display_name = path.split('.')[-1] if '.' in path else path
else:
display_name = path.split('.')[-1] if '.' in path else path
display_path = indent + display_name
# Truncate very long paths
if len(display_path) > 43:
display_path = display_path[:40] + "..."
size_str = format_size(size)
count_str = str(count) if count != 1 or data_type in ['dict', 'array', 'array_tail'] else ""
print(f"{display_path:<45} {size_str:<10} {data_type:<12} {count_str}")
def main():
"""Main function with argument parsing"""
parser = argparse.ArgumentParser(
description="Analyze and reduce JSON file structure",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
json-structure data.json # Reduce structure for LLM analysis (1 item per array)
json-structure -d 3 data.json # Keep 3 items per array/dict
json-structure -a 1 data.json # Analyze top-level keys only
json-structure -a 2 data.json # Analyze 2 levels deep
json-structure -t data.json # Interactive TUI mode
json-structure --tui data.json # Same as -t
"""
)
parser.add_argument('file_path', help='Path to JSON file')
parser.add_argument('-d', '--depth', type=int, default=1, metavar='N',
help='Number of items to keep in arrays and array-like dictionaries (default: 1)')
parser.add_argument('-a', '--analyzer', type=int, metavar='DEPTH',
help='Analyzer mode: show structure tree with size information up to DEPTH levels')
parser.add_argument('-t', '--tui', action='store_true',
help='Interactive TUI mode: navigate JSON structure with keyboard')
args = parser.parse_args()
# Load JSON file
try:
with open(args.file_path, 'r', encoding='utf-8') as file:
data = json.load(file)
except FileNotFoundError:
print(f"Error: File '{args.file_path}' not found.", file=sys.stderr)
sys.exit(1)
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in '{args.file_path}': {e}", file=sys.stderr)
sys.exit(1)
if args.tui:
# TUI mode - interactive tree navigation
try:
run_tui(data, args.file_path)
except KeyboardInterrupt:
print("\nExited by user.")
except Exception as e:
print(f"TUI Error: {e}", file=sys.stderr)
sys.exit(1)
elif args.analyzer is not None:
# Analyzer mode - show structure and sizes
depth = args.analyzer if args.analyzer > 0 else 1
results = analyze_json_structure(data, target_depth=depth)
print_analysis(results, depth)
else:
# Normal mode - reduce structure
max_items = max(1, args.depth) # Ensure at least 1 item
reduced_data = reduce_json_structure(data, max_items)
output = json.dumps(reduced_data, indent=4, ensure_ascii=False)
print(output)
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment