Last active
June 14, 2025 18:49
-
-
Save pedramamini/a5a717c79932cddb2ebf0b06f979e85e to your computer and use it in GitHub Desktop.
JSON Structure Analyzer and Reducer
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| JSON Structure Analyzer and Reducer | |
| =================================== | |
| A powerful command-line tool for analyzing, reducing, and exploring JSON file structures. | |
| Designed to help developers understand large JSON datasets by providing multiple modes | |
| of interaction: structure reduction for LLM analysis, hierarchical size analysis, and | |
| interactive terminal-based exploration. | |
| WHAT THIS PROGRAM DOES | |
| ====================== | |
| This tool addresses the common problem of working with large JSON files that exceed | |
| context limits for Large Language Models (LLMs) or are simply too large to understand | |
| at a glance. It provides three main modes of operation: | |
| 1. **Structure Reduction Mode (Default)** | |
| - Reduces JSON files while preserving structure and examples | |
| - Truncates long strings to manageable lengths (1000 chars) | |
| - Reduces arrays and array-like dictionaries to N representative items | |
| - Detects and handles GUID-based dictionaries intelligently | |
| - Perfect for feeding to LLMs while maintaining structural understanding | |
| 2. **Size Analysis Mode (-a/--analyzer)** | |
| - Provides hierarchical analysis of JSON structure with byte sizes | |
| - Shows size distribution at specified depth levels | |
| - Helps identify the largest data sections quickly | |
| - Displays results in a tree format with human-readable sizes | |
| 3. **Interactive TUI Mode (-t/--tui)** | |
| - Terminal User Interface for real-time JSON exploration | |
| - Split-panel design: tree navigation + value display | |
| - Keyboard-driven navigation with expand/collapse functionality | |
| - Syntax highlighting and smart size formatting | |
| - Perfect for understanding complex nested structures | |
| KEY FEATURES | |
| ============ | |
| • **Smart Dictionary Detection**: Distinguishes between semantic dictionaries and | |
| array-like dictionaries (sequential keys) or GUID-based collections | |
| • **Configurable Depth Control**: Control how many items to keep in arrays/dicts | |
| • **Memory Efficient**: Lazy loading and smart truncation strategies | |
| • **Multiple Output Formats**: JSON, tabular analysis, or interactive exploration | |
| • **Size-Aware Processing**: All operations consider actual byte sizes | |
| • **Cross-Platform**: Works on any system with Python 3.6+ and terminal support | |
| USAGE EXAMPLES | |
| ============== | |
| # Basic structure reduction (keep 1 item per array/dict) | |
| json-structure large-data.json | |
| # Keep more examples for better structure understanding | |
| json-structure -d 5 large-data.json | |
| # Analyze size distribution at different depths | |
| json-structure -a 1 large-data.json # Top-level only | |
| json-structure -a 3 large-data.json # 3 levels deep | |
| # Interactive exploration | |
| json-structure -t large-data.json | |
| # Combined usage for comprehensive analysis | |
| json-structure -d 3 -a 2 api-response.json | |
| CODE ORGANIZATION | |
| ================= | |
| Function Hierarchy and Edit Locations: | |
| **CORE DATA PROCESSING** | |
| ├── truncate_string() → Modify string truncation behavior | |
| ├── get_json_size() → Change size calculation method | |
| ├── format_size() / format_size_kb() → Adjust size display formats | |
| └── reduce_json_structure() → **Main reduction logic - edit for new reduction rules** | |
| **SMART DETECTION SYSTEM** | |
| ├── is_array_like_dict() → Modify sequential key detection (0,1,2...) | |
| └── is_guid_like_dict() → **Edit GUID/UUID pattern recognition** | |
| **SIZE ANALYSIS MODE** | |
| ├── analyze_json_structure() → Change depth-based analysis logic | |
| └── print_analysis() → Modify tabular output format | |
| **INTERACTIVE TUI MODE** | |
| ├── TreeNode class → Edit node behavior and data structure | |
| │ ├── build_children() → Modify lazy loading logic | |
| │ ├── toggle_expand() → Change expand/collapse behavior | |
| │ └── get_display_name() → Adjust node naming | |
| ├── JSONTreeTUI class → **Main TUI interface - edit for UI changes** | |
| │ ├── init_colors() → Modify color scheme | |
| │ ├── draw_tree() → Change left panel tree rendering | |
| │ ├── draw_value_panel() → **Edit right panel value display** | |
| │ ├── format_value_for_display() → Modify value formatting logic | |
| │ └── handle_input() → **Add new keyboard shortcuts** | |
| └── run_tui() → TUI initialization and error handling | |
| **APPLICATION ENTRY POINT** | |
| └── main() → **Edit for new command-line options** | |
| ├── Argument parsing → Add new CLI flags | |
| ├── Mode selection → Route to different processing modes | |
| └── Error handling → Modify error messages and exit codes | |
| """ | |
| import json | |
| import sys | |
| import argparse | |
| import curses | |
| import math | |
| from typing import Dict, List, Any, Tuple, Optional | |
| def truncate_string(value, max_length=1000): | |
| """Truncate strings to keep them manageable while preserving content examples""" | |
| if isinstance(value, str) and len(value) > max_length: | |
| return value[:max_length] + "... (truncated)" | |
| return value | |
| def is_array_like_dict(data): | |
| """ | |
| Detect if a dictionary is actually an array in disguise. | |
| Returns True if keys are sequential integers starting from 0 or 1. | |
| """ | |
| if not isinstance(data, dict) or len(data) < 3: # Need at least 3 items to be worth reducing | |
| return False | |
| keys = list(data.keys()) | |
| # Check if all keys are string representations of integers | |
| try: | |
| int_keys = [int(k) for k in keys] | |
| except (ValueError, TypeError): | |
| return False | |
| # Sort the integer keys | |
| int_keys.sort() | |
| # Check if they form a sequence starting from 0 or 1 | |
| if int_keys == list(range(len(int_keys))): # 0, 1, 2, 3... | |
| return True | |
| elif int_keys == list(range(1, len(int_keys) + 1)): # 1, 2, 3, 4... | |
| return True | |
| return False | |
| def is_guid_like_dict(data): | |
| """ | |
| Detect if a dictionary has GUID-like keys (UUIDs, hashes, etc.). | |
| Returns True if keys appear to be auto-generated identifiers. | |
| """ | |
| if not isinstance(data, dict) or len(data) < 3: | |
| return False | |
| keys = list(data.keys()) | |
| # Check for UUID pattern (8-4-4-4-12 hex digits with dashes) | |
| uuid_pattern_count = 0 | |
| hash_like_count = 0 | |
| for key in keys: | |
| if not isinstance(key, str): | |
| continue | |
| # UUID pattern: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | |
| if len(key) == 36 and key.count('-') == 4: | |
| parts = key.split('-') | |
| if (len(parts) == 5 and | |
| len(parts[0]) == 8 and len(parts[1]) == 4 and | |
| len(parts[2]) == 4 and len(parts[3]) == 4 and len(parts[4]) == 12): | |
| try: | |
| # Check if all parts are hex | |
| for part in parts: | |
| int(part, 16) | |
| uuid_pattern_count += 1 | |
| continue | |
| except ValueError: | |
| pass | |
| # Hash-like pattern: long hex strings (20+ chars, all hex) | |
| if len(key) >= 20 and len(key) <= 64: | |
| try: | |
| int(key, 16) | |
| hash_like_count += 1 | |
| continue | |
| except ValueError: | |
| pass | |
| # If most keys match GUID/hash patterns, consider it GUID-like | |
| total_keys = len(keys) | |
| guid_ratio = (uuid_pattern_count + hash_like_count) / total_keys | |
| return guid_ratio >= 0.8 # 80% of keys are GUID-like | |
| def reduce_json_structure(data, max_items=1): | |
| """ | |
| Reduce JSON structure by: | |
| 1. Truncating long strings to 1000 chars | |
| 2. Reducing arrays to max_items representative elements | |
| 3. Detecting array-like and GUID-like dictionaries and reducing them too | |
| 4. Preserving all other object keys and structure | |
| Args: | |
| data: The JSON data to reduce | |
| max_items: Maximum number of items to keep in arrays and array-like dicts | |
| """ | |
| if isinstance(data, dict): | |
| # Check if this dictionary is actually an array in disguise or has GUID-like keys | |
| if is_array_like_dict(data): | |
| # Treat it like an array - keep first max_items entries | |
| keys = sorted(data.keys(), key=int) | |
| reduced_data = {} | |
| # Keep first max_items entries | |
| for i, key in enumerate(keys[:max_items]): | |
| reduced_data[key] = reduce_json_structure(data[key], max_items) | |
| # Add indicator if we truncated | |
| if len(keys) > max_items: | |
| reduced_data["..."] = f"(array-like dict had {len(keys)} total entries)" | |
| return reduced_data | |
| elif is_guid_like_dict(data): | |
| # GUID-like dictionary - keep first max_items entries | |
| keys = list(data.keys()) | |
| reduced_data = {} | |
| # Keep first max_items entries | |
| for i, key in enumerate(keys[:max_items]): | |
| reduced_data[key] = reduce_json_structure(data[key], max_items) | |
| # Add indicator if we truncated | |
| if len(keys) > max_items: | |
| reduced_data["..."] = f"(GUID-like dict had {len(keys)} total entries)" | |
| return reduced_data | |
| else: | |
| # Regular dictionary - keep all keys | |
| reduced_data = {} | |
| for key, value in data.items(): | |
| reduced_data[key] = reduce_json_structure(value, max_items) | |
| return reduced_data | |
| elif isinstance(data, list): | |
| if not data: | |
| return [] | |
| # Keep first max_items elements | |
| reduced_items = [] | |
| for i in range(min(len(data), max_items)): | |
| reduced_items.append(reduce_json_structure(data[i], max_items)) | |
| # Add metadata about the original array size if we truncated | |
| if len(data) > max_items: | |
| reduced_items.append(f"... (array had {len(data)} total items)") | |
| return reduced_items | |
| else: | |
| # Apply string truncation to leaf values | |
| return truncate_string(data) | |
| def get_json_size(data): | |
| """Get the byte size of JSON data when serialized""" | |
| return len(json.dumps(data, separators=(',', ':')).encode('utf-8')) | |
| def format_size(size_bytes): | |
| """Format size in human-readable format""" | |
| if size_bytes < 1024: | |
| return f"{size_bytes}B" | |
| elif size_bytes < 1024 * 1024: | |
| return f"{size_bytes/1024:.1f}KB" | |
| else: | |
| return f"{size_bytes/(1024*1024):.1f}MB" | |
| def format_size_kb(size_bytes): | |
| """Format size in KB with appropriate decimal precision for TUI""" | |
| kb = size_bytes / 1024.0 | |
| if kb < 0.1: | |
| return f"{kb:.3f}KB" | |
| elif kb < 1.0: | |
| return f"{kb:.2f}KB" | |
| elif kb < 10.0: | |
| return f"{kb:.1f}KB" | |
| else: | |
| return f"{kb:.0f}KB" | |
| class TreeNode: | |
| """Represents a node in the JSON tree structure""" | |
| def __init__(self, key: str, value: Any, path: str = "", parent: Optional['TreeNode'] = None): | |
| self.key = key | |
| self.value = value | |
| self.path = path | |
| self.parent = parent | |
| self.children: List['TreeNode'] = [] | |
| self.expanded = False | |
| self.size = get_json_size(value) | |
| self.data_type = self._get_data_type() | |
| self.count = self._get_count() | |
| def _get_data_type(self) -> str: | |
| """Get the data type string for display""" | |
| if isinstance(self.value, dict): | |
| return "dict" | |
| elif isinstance(self.value, list): | |
| return "array" | |
| elif isinstance(self.value, str): | |
| return "string" | |
| elif isinstance(self.value, (int, float)): | |
| return "number" | |
| elif isinstance(self.value, bool): | |
| return "boolean" | |
| elif self.value is None: | |
| return "null" | |
| else: | |
| return "unknown" | |
| def _get_count(self) -> int: | |
| """Get count for display (length for containers, char count for strings)""" | |
| if isinstance(self.value, (dict, list)): | |
| return len(self.value) | |
| elif isinstance(self.value, str): | |
| return len(self.value) | |
| else: | |
| return 1 | |
| def build_children(self): | |
| """Build child nodes if not already built""" | |
| if self.children or not isinstance(self.value, (dict, list)): | |
| return | |
| if isinstance(self.value, dict): | |
| for key, val in self.value.items(): | |
| child_path = f"{self.path}.{key}" if self.path else key | |
| child = TreeNode(str(key), val, child_path, self) | |
| self.children.append(child) | |
| elif isinstance(self.value, list): | |
| for i, val in enumerate(self.value): | |
| child_path = f"{self.path}[{i}]" | |
| child = TreeNode(f"[{i}]", val, child_path, self) | |
| self.children.append(child) | |
| def toggle_expand(self): | |
| """Toggle expansion state""" | |
| if isinstance(self.value, (dict, list)) and len(self.value) > 0: | |
| self.expanded = not self.expanded | |
| if self.expanded: | |
| self.build_children() | |
| def get_display_name(self) -> str: | |
| """Get the display name for this node""" | |
| if self.key == "<root>": | |
| return "<root>" | |
| return self.key | |
| def is_expandable(self) -> bool: | |
| """Check if this node can be expanded""" | |
| return isinstance(self.value, (dict, list)) and len(self.value) > 0 | |
| class JSONTreeTUI: | |
| """Terminal User Interface for JSON tree visualization""" | |
| def __init__(self, data: Any, filename: str): | |
| self.root = TreeNode("<root>", data) | |
| self.filename = filename | |
| self.current_node = 0 | |
| self.scroll_offset = 0 | |
| self.visible_nodes: List[TreeNode] = [] | |
| self.total_size = get_json_size(data) | |
| # Color pairs | |
| self.COLOR_NORMAL = 1 | |
| self.COLOR_SELECTED = 2 | |
| self.COLOR_DICT = 3 | |
| self.COLOR_ARRAY = 4 | |
| self.COLOR_STRING = 5 | |
| self.COLOR_NUMBER = 6 | |
| self.COLOR_BOOLEAN = 7 | |
| self.COLOR_NULL = 8 | |
| self.COLOR_SIZE = 9 | |
| self.COLOR_STATUS = 10 | |
| def init_colors(self): | |
| """Initialize color pairs""" | |
| curses.start_color() | |
| curses.use_default_colors() | |
| curses.init_pair(self.COLOR_NORMAL, curses.COLOR_WHITE, -1) | |
| curses.init_pair(self.COLOR_SELECTED, curses.COLOR_BLACK, curses.COLOR_WHITE) | |
| curses.init_pair(self.COLOR_DICT, curses.COLOR_CYAN, -1) | |
| curses.init_pair(self.COLOR_ARRAY, curses.COLOR_YELLOW, -1) | |
| curses.init_pair(self.COLOR_STRING, curses.COLOR_GREEN, -1) | |
| curses.init_pair(self.COLOR_NUMBER, curses.COLOR_BLUE, -1) | |
| curses.init_pair(self.COLOR_BOOLEAN, curses.COLOR_MAGENTA, -1) | |
| curses.init_pair(self.COLOR_NULL, curses.COLOR_RED, -1) | |
| curses.init_pair(self.COLOR_SIZE, curses.COLOR_WHITE, -1) | |
| curses.init_pair(self.COLOR_STATUS, curses.COLOR_BLACK, curses.COLOR_CYAN) | |
| def get_type_color(self, data_type: str) -> int: | |
| """Get color pair for data type""" | |
| color_map = { | |
| "dict": self.COLOR_DICT, | |
| "array": self.COLOR_ARRAY, | |
| "string": self.COLOR_STRING, | |
| "number": self.COLOR_NUMBER, | |
| "boolean": self.COLOR_BOOLEAN, | |
| "null": self.COLOR_NULL, | |
| } | |
| return color_map.get(data_type, self.COLOR_NORMAL) | |
| def collect_visible_nodes(self, node: TreeNode, depth: int = 0) -> List[Tuple[TreeNode, int]]: | |
| """Collect all visible nodes with their depth""" | |
| result = [(node, depth)] | |
| if node.expanded: | |
| for child in node.children: | |
| result.extend(self.collect_visible_nodes(child, depth + 1)) | |
| return result | |
| def get_tree_prefix(self, depth: int, is_last: bool, parent_prefixes: List[bool]) -> str: | |
| """Generate tree prefix with ASCII characters""" | |
| if depth == 0: | |
| return "" | |
| prefix = "" | |
| for i in range(depth - 1): | |
| if parent_prefixes[i]: | |
| prefix += "│ " | |
| else: | |
| prefix += " " | |
| if is_last: | |
| prefix += "└── " | |
| else: | |
| prefix += "├── " | |
| return prefix | |
| def draw_tree(self, stdscr): | |
| """Draw the tree structure with split panel layout""" | |
| height, width = stdscr.getmaxyx() | |
| tree_height = height - 2 # Reserve space for status bar | |
| # Calculate panel widths (60% tree, 40% value panel) | |
| tree_width = int(width * 0.6) | |
| value_width = width - tree_width - 1 # -1 for separator | |
| # Collect visible nodes | |
| visible_with_depth = self.collect_visible_nodes(self.root) | |
| self.visible_nodes = [node for node, _ in visible_with_depth] | |
| # Calculate scroll bounds | |
| if self.current_node >= len(self.visible_nodes): | |
| self.current_node = len(self.visible_nodes) - 1 | |
| if self.current_node < 0: | |
| self.current_node = 0 | |
| # Adjust scroll offset | |
| if self.current_node < self.scroll_offset: | |
| self.scroll_offset = self.current_node | |
| elif self.current_node >= self.scroll_offset + tree_height: | |
| self.scroll_offset = self.current_node - tree_height + 1 | |
| # Clear screen | |
| stdscr.clear() | |
| # Draw tree panel | |
| for i in range(tree_height): | |
| node_index = self.scroll_offset + i | |
| if node_index >= len(visible_with_depth): | |
| break | |
| node, depth = visible_with_depth[node_index] | |
| # Determine if this is the last child at its level | |
| is_last = True | |
| if node.parent: | |
| siblings = node.parent.children | |
| is_last = siblings[-1] == node | |
| # Build parent prefixes for tree drawing | |
| parent_prefixes = [] | |
| current = node.parent | |
| while current and current.parent: # Skip root | |
| if current.parent.children[-1] != current: | |
| parent_prefixes.insert(0, True) | |
| else: | |
| parent_prefixes.insert(0, False) | |
| current = current.parent | |
| # Generate tree prefix | |
| tree_prefix = self.get_tree_prefix(depth, is_last, parent_prefixes) | |
| # Expansion indicator | |
| if node.is_expandable(): | |
| expand_char = "▼" if node.expanded else "▶" | |
| else: | |
| expand_char = " " | |
| # Node display name | |
| display_name = node.get_display_name() | |
| # Size information | |
| size_str = format_size_kb(node.size) | |
| # Type and count info | |
| type_info = f"({node.data_type}" | |
| if node.count > 1 or node.data_type in ["dict", "array"]: | |
| type_info += f", {node.count}" | |
| type_info += ")" | |
| # Build display line | |
| line = f"{tree_prefix}{expand_char} {display_name} {size_str} {type_info}" | |
| # Truncate to fit tree panel | |
| if len(line) > tree_width - 1: | |
| line = line[:tree_width - 4] + "..." | |
| # Determine colors | |
| if node_index == self.current_node: | |
| attr = curses.color_pair(self.COLOR_SELECTED) | curses.A_BOLD | |
| else: | |
| attr = curses.color_pair(self.get_type_color(node.data_type)) | |
| try: | |
| stdscr.addstr(i, 0, line, attr) | |
| except curses.error: | |
| pass # Ignore errors from writing to edge of screen | |
| # Draw vertical separator | |
| for i in range(tree_height): | |
| try: | |
| stdscr.addstr(i, tree_width, "│", curses.color_pair(self.COLOR_NORMAL)) | |
| except curses.error: | |
| pass | |
| # Draw value panel | |
| self.draw_value_panel(stdscr, tree_width + 1, value_width, tree_height) | |
| # Draw status bar | |
| self.draw_status_bar(stdscr) | |
| stdscr.refresh() | |
| def draw_value_panel(self, stdscr, start_x, panel_width, panel_height): | |
| """Draw the value panel showing the current node's content""" | |
| if self.current_node >= len(self.visible_nodes): | |
| return | |
| current_node = self.visible_nodes[self.current_node] | |
| # Panel header | |
| header = f" Value: {current_node.get_display_name()} " | |
| if len(header) > panel_width: | |
| header = header[:panel_width - 3] + "..." | |
| try: | |
| stdscr.addstr(0, start_x, header, | |
| curses.color_pair(self.COLOR_STATUS) | curses.A_BOLD) | |
| except curses.error: | |
| pass | |
| # Format the value for display | |
| value_lines = self.format_value_for_display(current_node.value, panel_width - 2) | |
| # Display value lines | |
| for i, line in enumerate(value_lines[:panel_height - 2]): # Leave space for header | |
| display_line = line[:panel_width - 1] # Ensure it fits | |
| # Determine color based on content | |
| attr = curses.color_pair(self.COLOR_NORMAL) | |
| if line.strip().startswith('"') and line.strip().endswith('"'): | |
| attr = curses.color_pair(self.COLOR_STRING) | |
| elif line.strip() in ['true', 'false']: | |
| attr = curses.color_pair(self.COLOR_BOOLEAN) | |
| elif line.strip() == 'null': | |
| attr = curses.color_pair(self.COLOR_NULL) | |
| elif line.strip().replace('.', '').replace('-', '').isdigit(): | |
| attr = curses.color_pair(self.COLOR_NUMBER) | |
| elif line.strip().startswith('{') or line.strip().startswith('['): | |
| attr = curses.color_pair(self.COLOR_DICT if line.strip().startswith('{') else self.COLOR_ARRAY) | |
| try: | |
| stdscr.addstr(i + 1, start_x + 1, display_line, attr) | |
| except curses.error: | |
| pass | |
| def format_value_for_display(self, value, max_width): | |
| """Format a value for display in the value panel""" | |
| if isinstance(value, (dict, list)): | |
| # For containers, show formatted JSON | |
| try: | |
| json_str = json.dumps(value, indent=2, ensure_ascii=False) | |
| lines = json_str.split('\n') | |
| # Truncate very long output | |
| if len(lines) > 50: | |
| lines = lines[:47] + ['...', f'({len(lines) - 47} more lines)'] | |
| # Wrap long lines | |
| wrapped_lines = [] | |
| for line in lines: | |
| if len(line) <= max_width: | |
| wrapped_lines.append(line) | |
| else: | |
| # Simple word wrapping for JSON | |
| while len(line) > max_width: | |
| wrapped_lines.append(line[:max_width]) | |
| line = ' ' + line[max_width:] # Indent continuation | |
| if line.strip(): | |
| wrapped_lines.append(line) | |
| return wrapped_lines | |
| except Exception: | |
| return [f"<{type(value).__name__} with {len(value)} items>"] | |
| elif isinstance(value, str): | |
| # For strings, show with quotes and handle long strings | |
| if len(value) > 1000: | |
| display_value = value[:1000] + "... (truncated)" | |
| else: | |
| display_value = value | |
| # Add quotes and wrap lines | |
| quoted_value = json.dumps(display_value, ensure_ascii=False) | |
| lines = [] | |
| current_line = "" | |
| for char in quoted_value: | |
| if len(current_line) >= max_width - 1: | |
| lines.append(current_line) | |
| current_line = "" | |
| current_line += char | |
| if current_line: | |
| lines.append(current_line) | |
| return lines | |
| else: | |
| # For primitives, show as JSON | |
| json_str = json.dumps(value, ensure_ascii=False) | |
| if len(json_str) <= max_width: | |
| return [json_str] | |
| else: | |
| # Split long primitive values | |
| lines = [] | |
| while len(json_str) > max_width: | |
| lines.append(json_str[:max_width]) | |
| json_str = json_str[max_width:] | |
| if json_str: | |
| lines.append(json_str) | |
| return lines | |
| def draw_status_bar(self, stdscr): | |
| """Draw the status bar at the bottom""" | |
| height, width = stdscr.getmaxyx() | |
| status_y = height - 1 | |
| # Current path | |
| current_path = "" | |
| if self.current_node < len(self.visible_nodes): | |
| current_path = self.visible_nodes[self.current_node].path or "<root>" | |
| # Status information | |
| total_size_str = format_size_kb(self.total_size) | |
| status_left = f"Path: {current_path}" | |
| status_right = f"Size: {total_size_str} | ↑↓:Navigate Enter:Expand/Collapse q:Quit" | |
| # Truncate left side if needed | |
| available_width = width - len(status_right) - 3 | |
| if len(status_left) > available_width: | |
| status_left = status_left[:available_width - 3] + "..." | |
| # Create full status line | |
| padding = width - len(status_left) - len(status_right) | |
| status_line = status_left + " " * padding + status_right | |
| try: | |
| stdscr.addstr(status_y, 0, status_line[:width], | |
| curses.color_pair(self.COLOR_STATUS) | curses.A_BOLD) | |
| except curses.error: | |
| pass | |
| def handle_input(self, key: int) -> bool: | |
| """Handle keyboard input. Returns False to quit.""" | |
| if key == ord('q') or key == ord('Q'): | |
| return False | |
| elif key == curses.KEY_UP: | |
| if self.current_node > 0: | |
| self.current_node -= 1 | |
| elif key == curses.KEY_DOWN: | |
| if self.current_node < len(self.visible_nodes) - 1: | |
| self.current_node += 1 | |
| elif key == curses.KEY_ENTER or key == 10 or key == 13: | |
| if self.current_node < len(self.visible_nodes): | |
| self.visible_nodes[self.current_node].toggle_expand() | |
| elif key == curses.KEY_LEFT: | |
| # Collapse current node or go to parent | |
| if self.current_node < len(self.visible_nodes): | |
| node = self.visible_nodes[self.current_node] | |
| if node.expanded: | |
| node.expanded = False | |
| elif node.parent and node.parent != self.root: | |
| # Find parent in visible nodes | |
| for i, visible_node in enumerate(self.visible_nodes): | |
| if visible_node == node.parent: | |
| self.current_node = i | |
| break | |
| elif key == curses.KEY_RIGHT: | |
| # Expand current node | |
| if self.current_node < len(self.visible_nodes): | |
| node = self.visible_nodes[self.current_node] | |
| if node.is_expandable() and not node.expanded: | |
| node.toggle_expand() | |
| return True | |
| def run(self, stdscr): | |
| """Main TUI loop""" | |
| self.init_colors() | |
| curses.curs_set(0) # Hide cursor | |
| stdscr.keypad(True) # Enable special keys | |
| while True: | |
| self.draw_tree(stdscr) | |
| key = stdscr.getch() | |
| if not self.handle_input(key): | |
| break | |
| def run_tui(data: Any, filename: str): | |
| """Run the TUI application""" | |
| tui = JSONTreeTUI(data, filename) | |
| curses.wrapper(tui.run) | |
| def analyze_json_structure(data, path="", current_depth=0, target_depth=1): | |
| """ | |
| Analyze JSON structure up to specified depth and return size information. | |
| Returns a list of (path, size, type, count, depth) tuples. | |
| """ | |
| results = [] | |
| current_size = get_json_size(data) | |
| if isinstance(data, dict): | |
| # Add info about this dictionary | |
| results.append((path, current_size, "dict", len(data), current_depth)) | |
| # Only recurse if we haven't reached target depth | |
| if current_depth < target_depth: | |
| for key, value in data.items(): | |
| key_path = f"{path}.{key}" if path else key | |
| results.extend(analyze_json_structure(value, key_path, current_depth + 1, target_depth)) | |
| elif isinstance(data, list): | |
| # Add info about this array | |
| results.append((path, current_size, "array", len(data), current_depth)) | |
| # Only recurse if we haven't reached target depth | |
| if current_depth < target_depth: | |
| # Analyze first few items to show structure variety | |
| for i, item in enumerate(data[:3]): # Only analyze first 3 items | |
| item_path = f"{path}[{i}]" | |
| results.extend(analyze_json_structure(item, item_path, current_depth + 1, target_depth)) | |
| if len(data) > 3: | |
| # Add summary for remaining items at this depth level | |
| remaining_size = sum(get_json_size(item) for item in data[3:]) | |
| results.append((f"{path}[3...{len(data)-1}]", remaining_size, "array_tail", len(data) - 3, current_depth + 1)) | |
| else: | |
| # Leaf value - only show if we're at or below target depth | |
| if current_depth <= target_depth: | |
| value_type = type(data).__name__ | |
| if isinstance(data, str): | |
| count = len(data) # String length | |
| else: | |
| count = 1 | |
| results.append((path, current_size, value_type, count, current_depth)) | |
| return results | |
| def print_analysis(results, target_depth): | |
| """Print the analysis results in a tree-like format""" | |
| # Sort by depth first, then by size within each depth level | |
| results.sort(key=lambda x: (x[4], -x[1])) # x[4] is depth, x[1] is size | |
| print(f"JSON Structure Analysis (depth: {target_depth})") | |
| print("=" * 70) | |
| print(f"{'Path':<45} {'Size':<10} {'Type':<12} {'Count'}") | |
| print("-" * 70) | |
| for path, size, data_type, count, depth in results: | |
| # Add indentation based on actual depth | |
| indent = " " * depth | |
| # Get the display name (last part of path) | |
| if path == "": | |
| display_name = "<root>" | |
| elif '[' in path and ']' in path: | |
| # Array index notation | |
| display_name = path.split('.')[-1] if '.' in path else path | |
| else: | |
| display_name = path.split('.')[-1] if '.' in path else path | |
| display_path = indent + display_name | |
| # Truncate very long paths | |
| if len(display_path) > 43: | |
| display_path = display_path[:40] + "..." | |
| size_str = format_size(size) | |
| count_str = str(count) if count != 1 or data_type in ['dict', 'array', 'array_tail'] else "" | |
| print(f"{display_path:<45} {size_str:<10} {data_type:<12} {count_str}") | |
| def main(): | |
| """Main function with argument parsing""" | |
| parser = argparse.ArgumentParser( | |
| description="Analyze and reduce JSON file structure", | |
| formatter_class=argparse.RawDescriptionHelpFormatter, | |
| epilog=""" | |
| Examples: | |
| json-structure data.json # Reduce structure for LLM analysis (1 item per array) | |
| json-structure -d 3 data.json # Keep 3 items per array/dict | |
| json-structure -a 1 data.json # Analyze top-level keys only | |
| json-structure -a 2 data.json # Analyze 2 levels deep | |
| json-structure -t data.json # Interactive TUI mode | |
| json-structure --tui data.json # Same as -t | |
| """ | |
| ) | |
| parser.add_argument('file_path', help='Path to JSON file') | |
| parser.add_argument('-d', '--depth', type=int, default=1, metavar='N', | |
| help='Number of items to keep in arrays and array-like dictionaries (default: 1)') | |
| parser.add_argument('-a', '--analyzer', type=int, metavar='DEPTH', | |
| help='Analyzer mode: show structure tree with size information up to DEPTH levels') | |
| parser.add_argument('-t', '--tui', action='store_true', | |
| help='Interactive TUI mode: navigate JSON structure with keyboard') | |
| args = parser.parse_args() | |
| # Load JSON file | |
| try: | |
| with open(args.file_path, 'r', encoding='utf-8') as file: | |
| data = json.load(file) | |
| except FileNotFoundError: | |
| print(f"Error: File '{args.file_path}' not found.", file=sys.stderr) | |
| sys.exit(1) | |
| except json.JSONDecodeError as e: | |
| print(f"Error: Invalid JSON in '{args.file_path}': {e}", file=sys.stderr) | |
| sys.exit(1) | |
| if args.tui: | |
| # TUI mode - interactive tree navigation | |
| try: | |
| run_tui(data, args.file_path) | |
| except KeyboardInterrupt: | |
| print("\nExited by user.") | |
| except Exception as e: | |
| print(f"TUI Error: {e}", file=sys.stderr) | |
| sys.exit(1) | |
| elif args.analyzer is not None: | |
| # Analyzer mode - show structure and sizes | |
| depth = args.analyzer if args.analyzer > 0 else 1 | |
| results = analyze_json_structure(data, target_depth=depth) | |
| print_analysis(results, depth) | |
| else: | |
| # Normal mode - reduce structure | |
| max_items = max(1, args.depth) # Ensure at least 1 item | |
| reduced_data = reduce_json_structure(data, max_items) | |
| output = json.dumps(reduced_data, indent=4, ensure_ascii=False) | |
| print(output) | |
| if __name__ == "__main__": | |
| main() | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment