Skip to content

Instantly share code, notes, and snippets.

@KunYi
Last active January 16, 2026 07:33
Show Gist options
  • Select an option

  • Save KunYi/1c90a87a17f6ac6105c9405fa0814276 to your computer and use it in GitHub Desktop.

Select an option

Save KunYi/1c90a87a17f6ac6105c9405fa0814276 to your computer and use it in GitHub Desktop.
A specification of MCU OTA

MCU OTA System Specification

1. System Architecture

The OTA system is composed of three main layers:

1.1 Device Layer (MCU Side)

  • MCU Firmware: Bootloader + Application
  • OTA Agent: Handles download, verification, and flash writing
  • Security Module: RSA/ECDSA verification, AES encryption

1.2 Cloud Layer

  • Device Management Service
    • Device registration and authentication
    • Device grouping and tagging (Region / HW version / Model / Customer)
    • OTA status tracking (LastSeen, Update Status, Retry Count)
  • OTA Release Service
    • Firmware storage (e.g., via CDN)
    • Release rules per device group
    • Metadata (version, checksum, release strategy)
  • Security & Authentication
    • Token-based device authentication (JWT / X.509)
    • Firmware signature verification

1.3 User Layer

  • Web / Mobile Portal
    • View device list and status
    • Publish OTA versions
    • Configure OTA strategy (rollout %, schedule, group)
    • View OTA history and logs

2. Device Management Design

2.1 Device Registration

  • On first boot, MCU sends device_info to the cloud:
    • Device ID (UUID or MCU UID)
    • HW version / FW version
    • Network information
  • Cloud returns a Device Token or certificate
  • Device enters Active status upon successful registration

2.2 Device Grouping

  • Group by Region, HW version, Customer, or Beta/Stable tier
  • OTA rollout can target specific groups

2.3 OTA Status Tracking

Each device reports:

  • Current FW version
  • Update status (Idle / Downloading / Updating / Success / Fail)
  • Update progress (%)
  • Last connection time
  • Rollback status (if any)

2.4 Rollback / Fail-Safe

  • Bootloader manages two partitions:
    • A (current)
    • B (new)
  • Failed OTA automatically rolls back to partition A
  • Cloud can flag devices for rollback

3. OTA Update Workflow

Recommended: staged rollout + chunked updates + verification

3.1 Update Strategy

  • Support Full Update or Delta Update
  • Staged rollout:
    • Update 5% of devices first, verify stability, then expand
  • Schedule:
    • OTA can occur at idle/off-peak time
    • Avoid impacting critical operations

3.2 OTA Update Flow

  1. Cloud publishes OTA

    • Upload firmware + metadata
    • Define target group / rollout %
    • Metadata includes: version, size, hash, signature, release notes
  2. MCU checks for updates

    • OTA agent polls cloud periodically
    • Compares metadata to decide whether to download
  3. Firmware download

    • Protocols: HTTP(S) / MQTT / CoAP
    • Support chunked download to handle MCU RAM/Flash limits
    • Each chunk verified by CRC/hash
  4. Firmware verification

    • Verify integrity (hash)
    • Verify signature (RSA/ECDSA)
    • Reject invalid firmware and report failure
  5. Flash write

    • Bootloader manages A/B partitions or temp buffer
    • Support resume for interrupted downloads
  6. Partition switch & reboot

    • Bootloader switches to new partition
    • Report update result to cloud
  7. Rollback mechanism

    • MCU detects failed boot and rolls back
    • Cloud can enforce rollback if needed

4. Security & Reliability

  1. Communication Security

    • HTTPS / TLS1.2+
    • Device Token / X.509 authentication
  2. Firmware Security

    • Firmware signing (RSA/ECDSA)
    • Prevent tampering
  3. Redundancy

    • Dual partition write for fail-safe
    • Chunk verification to prevent corruption
  4. Logging & Monitoring

    • Record each OTA success/failure
    • Exportable CSV/JSON logs for analysis

5. User Interface Design

  1. Device Overview

    • List devices, FW versions, Online/Offline
    • OTA progress bar and status
  2. OTA Publish

    • Select firmware version
    • Target device group / rollout %
    • Schedule updates
  3. History & Reporting

    • Success rate
    • Failure reasons
    • Rollback records

6. Advanced Considerations

  • Delta Update: Reduce transfer size for low-resource MCU
  • Concurrent device limits: Avoid network congestion
  • Retry strategy: Automatic retry with exponential backoff
  • A/B testing: Beta vs. Stable groups for phased rollout

7. Rollout Automation

  • Goal: Automatic small-batch updates (Rollout) to reduce risk and control traffic
  • Key Parameters:
    • Rollout % or Batch Size
    • Interval / Delay between batches
    • Max Retry per device
    • Fail Threshold (pause rollout if too many failures)
  • Automation Flow:
    1. Cloud divides devices into batches based on rollout parameters
    2. Each batch scheduled at a specific time
    3. Devices check-in to cloud to see if their batch is ready
    4. Cloud allows batch to download firmware
    5. Status tracked, next batch released only if success rate acceptable
  • Benefits:
    • Avoids network congestion
    • Allows early detection of firmware issues
    • Supports rollback if needed

8. OTA Flow: MQTT vs CoAP

flowchart TD
    %% Nodes
    UP[User Portal: Publish OTA & Rollout Schedule]
    C1[Cloud: Device Management & Rollout Scheduler]
    C2[Cloud: Firmware Storage & Metadata]
    M1[MCU: Check-in / Poll for Update]
    M2[MCU: Download Firmware #40;chunked#41;]
    M3[MCU: Verify & Write Flash]
    M4[MCU: Reboot & Report Status]
    M5[MCU: Rollback if Failed]

    %% MQTT Path
    UP --> C1
    C1 --> C2
    C2 --> M1
    M1 --> M2
    M2 --> M3
    M3 --> M4
    M4 --> C1
    M5 --> C1

    %% CoAP Path (intermittent / low bandwidth)
    M1 -.-> C1
    C1 -.-> M1
    M1 --> M2
    M2 --> M3
    M3 --> M4
    M4 --> C1
    M5 --> C1

    %% Styling
    style UP fill:#bfb,stroke:#333,stroke-width:1px
    style C1 fill:#bbf,stroke:#333,stroke-width:1px
    style C2 fill:#bbf,stroke:#333,stroke-width:1px
    style M1 fill:#f9f,stroke:#333,stroke-width:1px
    style M2 fill:#d0f0fd,stroke:#333,stroke-width:1px
    style M3 fill:#d0f0fd,stroke:#333,stroke-width:1px
    style M4 fill:#bbf,stroke:#333,stroke-width:1px
    style M5 fill:#fcc,stroke:#333,stroke-width:1px
Loading

Diagram Explanation

  1. MQTT
  • Persistent connection
  • Immediate push of OTA batches
  • Suitable for Wi-Fi / Ethernet MCU
  1. CoAP
  • Short connection, intermittent polling
  • Suitable for Cellular / NB-IoT MCU
  • OTA delay depends on polling interval
  1. Shared Steps
  • MCU downloads firmware in chunks
  • Verifies, writes flash, reboots
  • Reports status to Cloud
  • Automatic rollback on failure

9. Traffic and Power Optimization

  • Use chunked OTA to reduce retry cost
  • Use Delta Update to reduce firmware size
  • For Cellular MCU:
    • Short check-in intervals
    • Sleep after download/verification
  • Rollout batches reduce network peak load
  • CoAP observe or MQTT can balance immediacy vs traffic

Appendix A: Device Lifecycle Diagram

flowchart TD
    %% Device States
    A[Power On / First Boot] --> B[Device Registration]
    B --> C{Registration Success?}
    C -- Yes --> D[Active / Idle]
    C -- No --> E[Retry Registration / Alert]

    %% OTA Process
    D --> F{Check for OTA?}
    F -- No --> D
    F -- Yes --> G[Download Firmware Chunked]
    G --> H[Verify & Write Flash]
    H --> I{Verification Success?}
    I -- Yes --> J[Reboot & Activate New FW]
    I -- No --> K[Rollback to Previous FW]
    K --> D

    %% Reporting
    J --> L[Report OTA Success to Cloud]
    K --> M[Report OTA Failure / Rollback]

    %% Device Lifecycle End
    D --> N[Decommission / Retire Device]
    L --> D
    M --> D

    %% Styling
    style A fill:#fef2c0,stroke:#333,stroke-width:1px
    style B fill:#bbf,stroke:#333,stroke-width:1px
    style C fill:#f9f,stroke:#333,stroke-width:1px
    style D fill:#bfb,stroke:#333,stroke-width:1px
    style E fill:#fcc,stroke:#333,stroke-width:1px
    style F fill:#ffe0b3,stroke:#333,stroke-width:1px
    style G fill:#d0f0fd,stroke:#333,stroke-width:1px
    style H fill:#d0f0fd,stroke:#333,stroke-width:1px
    style I fill:#f9f,stroke:#333,stroke-width:1px
    style J fill:#bbf,stroke:#333,stroke-width:1px
    style K fill:#fcc,stroke:#333,stroke-width:1px
    style L fill:#bfb,stroke:#333,stroke-width:1px
    style M fill:#f99,stroke:#333,stroke-width:1px
    style N fill:#ddd,stroke:#333,stroke-width:1px
Loading

Diagram Explanation

  1. Power On / Registration
  • Device starts and registers with cloud
  • Registration may retry on failure
  1. Active / Idle
  • Device waits for OTA or normal operation
  1. OTA Process
  • MCU checks in with Cloud for available updates
  • Downloads firmware in chunks
  • Verifies and writes to flash
  • Reboots to activate new firmware
  • If verification fails → automatic rollback
  1. Reporting
  • Cloud receives OTA success or failure report
  • Decommission / Retire
  • Device can be retired when no longer used
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment