Skip to content

Instantly share code, notes, and snippets.

@r4d10n
Created November 29, 2025 09:27
Show Gist options
  • Select an option

  • Save r4d10n/4b553287ab91b05ba5e38c6b78a1ef24 to your computer and use it in GitHub Desktop.

Select an option

Save r4d10n/4b553287ab91b05ba5e38c6b78a1ef24 to your computer and use it in GitHub Desktop.
PyGhidra ARM ELF Iterative Decompilation Prompt

PyGhidra ARM ELF Iterative Decompilation Prompt Overview This prompt guides Claude Code Web through systematic reverse engineering of ARM ELF binaries using PyGhidra. The workflow prioritizes core application logic, skips open-source library code, and produces well-documented reconstructed source with detailed algorithm explanations.

Binary Specification

Format: ELF (Executable and Linkable Format) Architecture: ARM (32-bit ARM / Thumb / AArch64 - confirm after loading) Analysis Focus: Proprietary core functions only Skip: Standard libraries, compiler runtime, open-source components

Phase 1: Environment Setup and Initial Analysis 1.1 Setup PyGhidra python# Install PyGhidra and dependencies

Ensure Ghidra is installed with ARM processor modules

Configure headless analysis environment

import pyghidra pyghidra.start()

from ghidra.program.flatapi import FlatProgramAPI from ghidra.app.decompiler import DecompInterface from ghidra.util.task import ConsoleTaskMonitor 1.2 Load Binary python# Load the ARM ELF binary

Enable these analyzer options:

- ARM/Thumb mode detection

- Aggressive function boundary detection

- Symbol demangling for C++ if present

- Reference analysis

- Stack variable recovery

1.3 Generate Initial Reports After loading, produce:

Binary metadata: Architecture variant, endianness, entry point, section layout Function inventory: Address, size, basic block count, name/symbol String table: All referenced strings with their cross-references Import/Export tables: External dependencies and exposed symbols Section map: .text, .data, .rodata, .bss boundaries and permissions

Phase 2: Function Triage 2.1 Functions to SKIP (Mark as Library/External) Automatically exclude from detailed analysis: Standard C Library (libc/bionic/musl)

Memory: malloc, free, calloc, realloc, memcpy, memset, memmove, memcmp String: strlen, strcpy, strncpy, strcmp, strncmp, strcat, strchr, strstr, strdup I/O: printf, fprintf, sprintf, snprintf, puts, fputs, fopen, fclose, fread, fwrite Conversion: atoi, atol, strtol, strtoul, sscanf System: exit, abort, getenv, system

PLT/GOT Entries

All functions resolved through Procedure Linkage Table External dynamic library calls

Compiler Runtime

__stack_chk_fail, __stack_chk_guard _aeabi* (ARM EABI helpers: division, floating point, memory) _cxa* (C++ exception handling) _gnu* (GCC intrinsics) frame_dummy, register_tm_clones, deregister_tm_clones _init, _fini (unless containing custom code)

C++ Runtime

operator new, operator delete and array variants Virtual table thunks (_ZThn*, _ZTv*) RTTI functions (_ZTI*, _ZTS*) __cxa_atexit, __cxa_finalize

Known Open-Source Libraries LibrarySignature PatternsOpenSSL/mbedTLSSSL_, EVP_, AES_, SHAInit/Update/Final, RSA, mbedtls_zlibinflate, deflate, crc32, adler32, compress*, uncompresslibcurlcurl_SQLitesqlite3_JSON parsersjson_, cJSON_, yyjson_Protobufprotobuf_, pb_encode, pb_decodeFreeRTOSxTask*, xQueue*, xSemaphore*, vTasklwIPtcp_, udp_, ip_, netconn_* 2.2 Functions to PRIORITIZE (Analyze in Order) Priority 1 - Entry Points:

main() function _start and early initialization Functions in .init_array / .fini_array Exported symbols (visible in dynamic symbol table) Interrupt vectors (for bare-metal/RTOS)

Priority 2 - String References:

Functions referencing unique application strings Configuration keys, error messages, protocol identifiers Debug strings revealing functionality Format strings with custom specifiers

Priority 3 - Structural Complexity:

High cyclomatic complexity (many branches/paths) Large function size (>500 bytes) Many basic blocks (>15) Switch statements with many cases

Priority 4 - Hub Functions:

Functions with many callers (dispatch/router logic) Functions calling many others (orchestration) Central state machine handlers

Priority 5 - Algorithm Signatures:

Heavy bitwise operations (XOR chains, rotations) S-box patterns (256-byte lookup tables) Mathematical operations (modular arithmetic) Checksums/hashing patterns

Priority 6 - System Interaction:

Functions near socket/network syscalls ioctl handlers mmap/memory management File I/O wrappers IPC mechanisms

2.3 Priority List Format Present the prioritized function list as: ┌──────────┬────────────┬─────────────────────┬───────┬─────┬─────────────────────────┐ │ Priority │ Address │ Name/Label │ Size │ BBs │ Reason │ ├──────────┼────────────┼─────────────────────┼───────┼─────┼─────────────────────────┤ │ 1 │ 0x00010a4c │ main │ 892 │ 23 │ Entry point │ │ 2 │ 0x00012f80 │ FUN_00012f80 │ 2104 │ 47 │ Refs "AUTH_TOKEN" │ │ 3 │ 0x00015200 │ FUN_00015200 │ 1560 │ 38 │ Hub: 12 callers │ │ 4 │ 0x00018c00 │ FUN_00018c00 │ 3200 │ 89 │ Large switch statement │ │ 5 │ 0x0001a400 │ FUN_0001a400 │ 512 │ 12 │ XOR/rotation patterns │ │ ... │ ... │ ... │ ... │ ... │ ... │ └──────────┴────────────┴─────────────────────┴───────┴─────┴─────────────────────────┘

Total functions: XXX Core functions to analyze: XX Library/external (skipped): XXX

Phase 3: Function Analysis Workflow 3.1 ARM-Specific Decompilation For each function, extract and note: ┌─────────────────────────────────────────────────────────────┐ │ ARM ANALYSIS CHECKLIST │ ├─────────────────────────────────────────────────────────────┤ │ □ Instruction set mode (ARM / Thumb / Thumb-2 / AArch64) │ │ □ Mode transitions (BX, BLX with LSB indicating mode) │ │ □ Calling convention (AAPCS standard or custom) │ │ - Parameters: r0-r3 (first 4 args), stack for rest │ │ - Return: r0 (r0-r1 for 64-bit) │ │ - Preserved: r4-r11, sp, lr │ │ □ Conditional execution (IT blocks in Thumb-2) │ │ □ NEON/VFP usage (vector operations) │ │ □ Inline assembly or intrinsics │ └─────────────────────────────────────────────────────────────┘ 3.2 ARM Instruction Pattern Recognition Instruction(s)MeaningCommon UsageUMULL, SMULL64-bit multiplyLarge integer math, cryptoMLA, MLSMultiply-accumulateDSP, checksums, hashingCLZCount leading zerosLog2, normalization, priorityRBITReverse bitsCRC, bit manipulationREV, REV16, REVSHByte swapEndianness conversionBFI, BFC, UBFX, SBFXBitfield opsFlag manipulation, protocol parsingLDM, STMLoad/store multipleStruct copy, bulk transferTBB, TBHTable branchSwitch statementsCBZ, CBNZCompare and branchLoop control, null checksIT blockIf-thenConditional execution (Thumb-2)LDRD, STRDDouble-word load/store64-bit variablesPLDPreload dataPerformance optimizationDMB, DSB, ISBMemory barriersSynchronization, DMA 3.3 Semantic Reconstruction Rules Variable Naming: Ghidra DefaultRename ToBased Onlocal_XXDescriptive nameUsage context (buffer_ptr, counter, flags)param_1Meaningful nameFunction purpose (input_data, length, callback)DAT_XXXXXGlobal purposeUsage pattern (g_config, g_state, g_buffer)FUN_XXXXXAction nameBehavior (parse_header, encrypt_block, send_response)PTR_XXXXXPointer targetWhat it points to (vtable_ptr, handler_array) Type Recovery: c// Common ARM type mappings uint8_t // Single byte, often flags or characters uint16_t // Half-word, often lengths or IDs uint32_t // Word, addresses, handles, most integers uint64_t // Double-word, timestamps, large counters size_t // Sizes and lengths void* // Generic pointers 3.4 Function Analysis Output Template ╔═══════════════════════════════════════════════════════════════════════════════╗ ║ FUNCTION ANALYSIS: [provisional_name] ║ ╠═══════════════════════════════════════════════════════════════════════════════╣ ║ Address: 0x[ADDRESS] ║ ║ Size: XXX bytes | Basic Blocks: XX | Cyclomatic Complexity: XX ║ ║ Mode: ARM / Thumb / Thumb-2 / AArch64 ║ ║ Stack Frame: XX bytes ║ ╚═══════════════════════════════════════════════════════════════════════════════╝

Original Ghidra Decompilation

// Raw Ghidra output preserved for reference
[paste exact decompiler output here]

Reconstructed Implementation

/**
 * @brief [One-line description of function purpose]
 *
 * [Extended description explaining what this function does,
 *  when it's called, and its role in the larger system]
 *
 * @param[in]     param1   Description of first parameter
 * @param[in,out] param2   Description of second parameter  
 * @param[out]    param3   Description of output parameter
 *
 * @return Description of return value and possible values
 *
 * @note Algorithm: [Name of algorithm if applicable]
 * @note Complexity: O(n) / O(n²) / O(log n) / etc.
 *
 * @warning [Any security concerns or edge cases]
 */
return_type function_name(type1 param1, type2 param2, type3* param3)
{
    // === INITIALIZATION ===
    // [Comments explaining variable setup]
    
    // === MAIN LOGIC ===
    // [Detailed inline comments for each logical block]
    // Explain WHY, not just WHAT
    
    // === CLEANUP/RETURN ===
    // [Final processing and return]
}

Algorithm Explanation

Purpose: [Detailed prose explanation of the function's purpose in the system]

Input Processing: [How inputs are validated and prepared]

Core Algorithm: [Step-by-step explanation of the main logic]

  1. First, the function...
  2. Then it iterates over...
  3. For each element, it...
  4. Finally, it returns...

Data Structures: [Description of any structures used]

// Discovered structure definition
typedef struct {
    uint32_t magic;           // Offset 0x00: Magic identifier (0xDEADBEEF)
    uint16_t version;         // Offset 0x04: Protocol version
    uint16_t flags;           // Offset 0x06: Status flags
    uint32_t payload_length;  // Offset 0x08: Length of following data
    uint8_t  payload[];       // Offset 0x0C: Variable-length payload
} message_header_t;

Mathematical/Cryptographic Details: [If applicable, explain any mathematical operations]

Error Handling: [How errors are detected and handled]

Control Flow Graph

            ┌─────────────────┐
            │  Entry Block    │
            │  [validate args]│
            └────────┬────────┘
                     │
            ┌────────▼────────┐
     ┌──────┤  Condition?     ├──────┐
     │ no   └─────────────────┘ yes  │
     │                               │

┌────────▼────────┐ ┌────────▼────────┐ │ Error Path │ │ Main Loop │◄──┐ │ [set error] │ │ [process item] │ │ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ ┌────────▼────────┐ │ │ │ More items? ├───┘ │ └────────┬────────┘ yes │ │ no │ ┌────────▼────────┐ └────────────────────►│ Exit Block │ │ [cleanup] │ └─────────────────┘

Call Graph Context

CALLERS (functions that call this one): ├── [ANALYZED] parent_function @ 0xXXXX │ └── Context: Called during initialization phase ├── [ANALYZED] another_caller @ 0xXXXX │ └── Context: Called in main processing loop └── [PENDING] FUN_00012345 @ 0xXXXX └── Context: Unknown, queued for analysis CALLEES (functions called by this one): ├── [LIBRARY] memcpy │ └── Purpose: Copy parsed data to output buffer ├── [ANALYZED] helper_function @ 0xXXXX │ └── Purpose: Validate message checksum ├── [PENDING] FUN_00067890 @ 0xXXXX │ └── Purpose: Unknown, queued for analysis └── [SKIP] malloc └── Purpose: Allocate response buffer

Key Observations

Aspect Details
Purpose [One-line summary]
Complexity Time: O(?) / Space: O(?)
Dependencies [Critical functions it relies on]
State Changes [Global state modified]
Side Effects [I/O, memory allocation, etc.]
Security Notes [Buffer handling, auth checks, crypto usage]

Analysis Confidence

  • Overall Confidence: HIGH / MEDIUM / LOW

Certain:

  • [List aspects that are well understood]

Uncertain:

  • [List aspects that need verification]
  • [Ambiguous interpretations]

Needs Further Investigation:

  • [Related functions to analyze]
  • [Missing context]

Phase 4: Iterative Feedback Protocol 4.1 After Each Function Analysis

Review: I will examine your analysis output Correct: I provide specific corrections:

Type corrections: "param2 is uint16_t length, not uint32_t flags" Algorithm identification: "This is CRC-32, not custom checksum" Naming improvements: "Rename to handle_auth_request" Structural insights: "The struct at offset 0x20 is 64 bytes total"

Refine: You update the analysis incorporating feedback Propagate: Changes cascade to related functions

4.2 Correction Format FEEDBACK FOR: function_name @ 0xADDRESS

CORRECTIONS:

  1. [Specific correction with explanation]
  2. [Another correction]

ADDITIONAL CONTEXT:

  • [Any domain knowledge I can provide]
  • [Related information from other analysis]

PROCEED TO: [Next function] or [Re-analyze current] or [Analyze related: FUN_XXX] 4.3 Periodic Consolidation After every 5-10 functions, produce:

Updated Function Index:

┌────────────┬─────────────────────┬──────────────┬────────────┐ │ Address │ Final Name │ Status │ Confidence │ ├────────────┼─────────────────────┼──────────────┼────────────┤ │ 0x00010a4c │ main │ COMPLETE │ HIGH │ │ 0x00012f80 │ authenticate_client │ COMPLETE │ HIGH │ │ 0x00015200 │ dispatch_command │ COMPLETE │ MEDIUM │ │ 0x00018c00 │ FUN_00018c00 │ IN PROGRESS │ - │ │ 0x0001a400 │ FUN_0001a400 │ PENDING │ - │ └────────────┴─────────────────────┴──────────────┴────────────┘

Accumulated Type Definitions:

c// All discovered structures so far typedef struct { ... } message_header_t; typedef struct { ... } session_context_t; typedef enum { ... } command_type_e;

Growing Call Graph:

main ├── init_subsystems │ ├── init_network │ └── init_crypto ├── main_loop │ ├── receive_message │ ├── dispatch_command │ │ ├── cmd_authenticate │ │ ├── cmd_query │ │ └── cmd_update │ └── send_response └── cleanup

Phase 5: Final Reconstruction 5.1 Consolidated Source File c/**

  • ═══════════════════════════════════════════════════════════════════════════
  • RECONSTRUCTED SOURCE CODE
  • ═══════════════════════════════════════════════════════════════════════════
  • Binary: [original filename]
  • SHA256: [hash of binary]
  • Analysis Date: [date]
  • Architecture: ARM [variant]
  • Endianness: [little/big]
  • Entry Point: 0x[address]
  • ═══════════════════════════════════════════════════════════════════════════
  • FUNCTION INDEX
  • ═══════════════════════════════════════════════════════════════════════════
  • Address Function Name Description

  • 0x00010a4c main Program entry point
  • 0x00012f80 authenticate_client Client authentication handler
  • ...
  • ═══════════════════════════════════════════════════════════════════════════ */

#include <stdint.h> #include <stddef.h> #include <stdbool.h>

/* ═══════════════════════════════════════════════════════════════════════════

  • TYPE DEFINITIONS
  • ═══════════════════════════════════════════════════════════════════════════ */

/**

  • @brief [Description of structure] */ typedef struct { // Field definitions with offsets and purposes } structure_name_t;

/**

  • @brief [Description of enum] */ typedef enum { // Enum values with meanings } enum_name_e;

/* ═══════════════════════════════════════════════════════════════════════════

  • GLOBAL VARIABLES
  • ═══════════════════════════════════════════════════════════════════════════ */

/**

  • @brief [Description of global]
  • @note Address: 0xXXXXXXXX */ static type_t g_variable_name;

/* ═══════════════════════════════════════════════════════════════════════════

  • FUNCTION IMPLEMENTATIONS
  • ═══════════════════════════════════════════════════════════════════════════ */

// [All analyzed functions in logical order with full documentation] 5.2 Architecture Diagram mermaidgraph TD subgraph Entry _start[_start] --> __libc_start_main __libc_start_main --> main end

subgraph Initialization
    main --> init_config[init_config<br/>Load configuration]
    main --> init_network[init_network<br/>Setup sockets]
    main --> init_crypto[init_crypto<br/>Initialize keys]
end

subgraph "Main Loop"
    main --> main_loop{main_loop}
    main_loop --> receive[receive_message]
    receive --> parse[parse_header]
    parse --> dispatch{dispatch_command}
end

subgraph "Command Handlers"
    dispatch -->|CMD_AUTH| auth[handle_auth]
    dispatch -->|CMD_QUERY| query[handle_query]
    dispatch -->|CMD_UPDATE| update[handle_update]
    dispatch -->|CMD_DISCONNECT| disconnect[handle_disconnect]
end

subgraph "Response"
    auth --> build_response
    query --> build_response
    update --> build_response
    build_response --> send[send_response]
    send --> main_loop
end

subgraph Cleanup
    disconnect --> cleanup[cleanup_resources]
    cleanup --> exit_prog[exit]
end

style main fill:#f9f,stroke:#333
style dispatch fill:#bbf,stroke:#333

5.3 Executive Summary ═══════════════════════════════════════════════════════════════════════════════ REVERSE ENGINEERING SUMMARY ═══════════════════════════════════════════════════════════════════════════════

BINARY OVERVIEW

Name: [filename] Type: ARM ELF Executable/Shared Object Purpose: [Inferred high-level purpose] Complexity: [Simple/Moderate/Complex]

KEY FINDINGS

  1. [Most important discovery]
  2. [Second important discovery]
  3. [Third important discovery]

ALGORITHMS IDENTIFIED

  • [Algorithm 1]: Used in [function] for [purpose]
  • [Algorithm 2]: Used in [function] for [purpose]
  • Cryptography: [What crypto is used and how]

DATA FLOW SUMMARY

Input → [Processing stages] → Output

[ASCII or description of data flow]

SECURITY OBSERVATIONS

  • Authentication: [How auth is implemented]
  • Encryption: [What is encrypted, how]
  • Input Validation: [How inputs are validated]
  • Potential Vulnerabilities:
    • [Observation 1]
    • [Observation 2]

UNKNOWN/UNCERTAIN AREAS

  • [Functions that couldn't be fully analyzed]
  • [Ambiguous logic]
  • [Missing context]

RECOMMENDATIONS

  • [Suggestions for further analysis]
  • [Areas needing dynamic analysis]
  • [Related binaries to examine]

═══════════════════════════════════════════════════════════════════════════════

Special Handling Instructions Obfuscated Code When encountering obfuscation:

Identify technique: Control flow flattening, opaque predicates, string encryption, etc. Document patterns: Note the obfuscation signatures Attempt recovery: Try to simplify where possible Flag for manual review: Mark sections that resist automated analysis

┌─────────────────────────────────────────────────────────────┐ │ OBFUSCATION DETECTED │ ├─────────────────────────────────────────────────────────────┤ │ Technique: [identified technique] │ │ Affected regions: 0xXXXX - 0xXXXX │ │ Recovery attempt: [partial/failed/successful] │ │ Notes: [observations about the obfuscation] │ └─────────────────────────────────────────────────────────────┘ Indirect Calls For function pointers and virtual calls:

Trace pointer origin: Where is the function pointer set? Identify possible targets: What functions could be called? Document dispatch table: If vtable or jump table, map all entries

c// Function pointer dispatch at 0xXXXX // Pointer loaded from: g_handlers[command_type] // Possible targets: // [0] = handler_connect @ 0xXXXX // [1] = handler_disconnect @ 0xXXXX // [2] = handler_data @ 0xXXXX Crypto Primitives When cryptographic code is detected:

Identify algorithm: AES, ChaCha20, SHA-256, RSA, etc. Note mode of operation: ECB, CBC, GCM, CTR, etc. Trace key material: Where do keys come from? Document IV/nonce handling: How are IVs generated?

┌─────────────────────────────────────────────────────────────┐ │ CRYPTOGRAPHIC OPERATION │ ├─────────────────────────────────────────────────────────────┤ │ Algorithm: AES-128-CBC │ │ Key source: Derived from g_master_key via HKDF │ │ IV handling: Random 16 bytes prepended to ciphertext │ │ Padding: PKCS#7 │ │ Implementation: Custom (not OpenSSL) │ └─────────────────────────────────────────────────────────────┘ Protocol Parsing When message parsing is found:

Reconstruct message format: Define as C struct Document field meanings: Purpose of each field Note validation: What checks are performed Trace state machine: How messages affect state

c/**

  • Protocol Message Format (reconstructed)
  • Offset Size Field Description

  • 0x00 4 magic Magic number (0x50524F54 "PROT")
  • 0x04 2 version Protocol version (major.minor)
  • 0x06 2 msg_type Message type identifier
  • 0x08 4 sequence Sequence number
  • 0x0C 4 payload_len Length of payload
  • 0x10 4 checksum CRC-32 of header + payload
  • 0x14 N payload Variable-length payload data */ typedef struct attribute((packed)) { uint32_t magic; uint16_t version; uint16_t msg_type; uint32_t sequence; uint32_t payload_len; uint32_t checksum; uint8_t payload[]; } protocol_message_t;

Quick Reference PyGhidra Common Operations python# Get all functions fm = currentProgram.getFunctionManager() functions = fm.getFunctions(True) # True = forward iteration

Get decompiled code

decomp = DecompInterface() decomp.openProgram(currentProgram) results = decomp.decompileFunction(func, 60, monitor) code = results.getDecompiledFunction().getC()

Get cross-references to function

refs = getReferencesTo(func.getEntryPoint())

Get cross-references from function

refs = getReferencesFrom(func.getEntryPoint())

Get strings

strings = findStrings(None, 4, 1, True, True)

Get function at address

func = getFunctionAt(toAddr(0x10000))

Get basic blocks

blocks = BasicBlockModel(currentProgram).getCodeBlocksContaining( func.getBody(), monitor) ARM Register Convention (AAPCS) ┌─────────┬──────────────────────────────────────────────────┐ │ Register│ Purpose │ ├─────────┼──────────────────────────────────────────────────┤ │ r0-r3 │ Arguments / Return value / Scratch │ │ r4-r11 │ Callee-saved (must preserve) │ │ r12 (IP)│ Intra-procedure scratch │ │ r13 (SP)│ Stack pointer │ │ r14 (LR)│ Link register (return address) │ │ r15 (PC)│ Program counter │ │ CPSR │ Current program status register │ └─────────┴──────────────────────────────────────────────────┘

Begin Analysis Upload the ARM ELF binary to begin. Initial steps:

Load and analyze the binary Report architecture details and sections Generate prioritized function list (excluding libraries) Begin analysis with highest priority function Wait for feedback before proceeding to next function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment