The OTA system is composed of three main layers:
- MCU Firmware: Bootloader + Application
- OTA Agent: Handles download, verification, and flash writing
- Security Module: RSA/ECDSA verification, AES encryption
- Device Management Service
- Device registration and authentication
- Device grouping and tagging (Region / HW version / Model / Customer)
- OTA status tracking (LastSeen, Update Status, Retry Count)
- OTA Release Service
- Firmware storage (e.g., via CDN)
- Release rules per device group
- Metadata (version, checksum, release strategy)
- Security & Authentication
- Token-based device authentication (JWT / X.509)
- Firmware signature verification
- Web / Mobile Portal
- View device list and status
- Publish OTA versions
- Configure OTA strategy (rollout %, schedule, group)
- View OTA history and logs
- On first boot, MCU sends
device_infoto the cloud:- Device ID (UUID or MCU UID)
- HW version / FW version
- Network information
- Cloud returns a Device Token or certificate
- Device enters Active status upon successful registration
- Group by Region, HW version, Customer, or Beta/Stable tier
- OTA rollout can target specific groups
Each device reports:
- Current FW version
- Update status (Idle / Downloading / Updating / Success / Fail)
- Update progress (%)
- Last connection time
- Rollback status (if any)
- Bootloader manages two partitions:
- A (current)
- B (new)
- Failed OTA automatically rolls back to partition A
- Cloud can flag devices for rollback
Recommended: staged rollout + chunked updates + verification
- Support Full Update or Delta Update
- Staged rollout:
- Update 5% of devices first, verify stability, then expand
- Schedule:
- OTA can occur at idle/off-peak time
- Avoid impacting critical operations
-
Cloud publishes OTA
- Upload firmware + metadata
- Define target group / rollout %
- Metadata includes: version, size, hash, signature, release notes
-
MCU checks for updates
- OTA agent polls cloud periodically
- Compares metadata to decide whether to download
-
Firmware download
- Protocols: HTTP(S) / MQTT / CoAP
- Support chunked download to handle MCU RAM/Flash limits
- Each chunk verified by CRC/hash
-
Firmware verification
- Verify integrity (hash)
- Verify signature (RSA/ECDSA)
- Reject invalid firmware and report failure
-
Flash write
- Bootloader manages A/B partitions or temp buffer
- Support resume for interrupted downloads
-
Partition switch & reboot
- Bootloader switches to new partition
- Report update result to cloud
-
Rollback mechanism
- MCU detects failed boot and rolls back
- Cloud can enforce rollback if needed
-
Communication Security
- HTTPS / TLS1.2+
- Device Token / X.509 authentication
-
Firmware Security
- Firmware signing (RSA/ECDSA)
- Prevent tampering
-
Redundancy
- Dual partition write for fail-safe
- Chunk verification to prevent corruption
-
Logging & Monitoring
- Record each OTA success/failure
- Exportable CSV/JSON logs for analysis
-
Device Overview
- List devices, FW versions, Online/Offline
- OTA progress bar and status
-
OTA Publish
- Select firmware version
- Target device group / rollout %
- Schedule updates
-
History & Reporting
- Success rate
- Failure reasons
- Rollback records
- Delta Update: Reduce transfer size for low-resource MCU
- Concurrent device limits: Avoid network congestion
- Retry strategy: Automatic retry with exponential backoff
- A/B testing: Beta vs. Stable groups for phased rollout
- Goal: Automatic small-batch updates (Rollout) to reduce risk and control traffic
- Key Parameters:
- Rollout % or Batch Size
- Interval / Delay between batches
- Max Retry per device
- Fail Threshold (pause rollout if too many failures)
- Automation Flow:
- Cloud divides devices into batches based on rollout parameters
- Each batch scheduled at a specific time
- Devices check-in to cloud to see if their batch is ready
- Cloud allows batch to download firmware
- Status tracked, next batch released only if success rate acceptable
- Benefits:
- Avoids network congestion
- Allows early detection of firmware issues
- Supports rollback if needed
flowchart TD
%% Nodes
UP[User Portal: Publish OTA & Rollout Schedule]
C1[Cloud: Device Management & Rollout Scheduler]
C2[Cloud: Firmware Storage & Metadata]
M1[MCU: Check-in / Poll for Update]
M2[MCU: Download Firmware #40;chunked#41;]
M3[MCU: Verify & Write Flash]
M4[MCU: Reboot & Report Status]
M5[MCU: Rollback if Failed]
%% MQTT Path
UP --> C1
C1 --> C2
C2 --> M1
M1 --> M2
M2 --> M3
M3 --> M4
M4 --> C1
M5 --> C1
%% CoAP Path (intermittent / low bandwidth)
M1 -.-> C1
C1 -.-> M1
M1 --> M2
M2 --> M3
M3 --> M4
M4 --> C1
M5 --> C1
%% Styling
style UP fill:#bfb,stroke:#333,stroke-width:1px
style C1 fill:#bbf,stroke:#333,stroke-width:1px
style C2 fill:#bbf,stroke:#333,stroke-width:1px
style M1 fill:#f9f,stroke:#333,stroke-width:1px
style M2 fill:#d0f0fd,stroke:#333,stroke-width:1px
style M3 fill:#d0f0fd,stroke:#333,stroke-width:1px
style M4 fill:#bbf,stroke:#333,stroke-width:1px
style M5 fill:#fcc,stroke:#333,stroke-width:1px
- MQTT
- Persistent connection
- Immediate push of OTA batches
- Suitable for Wi-Fi / Ethernet MCU
- CoAP
- Short connection, intermittent polling
- Suitable for Cellular / NB-IoT MCU
- OTA delay depends on polling interval
- Shared Steps
- MCU downloads firmware in chunks
- Verifies, writes flash, reboots
- Reports status to Cloud
- Automatic rollback on failure
- Use chunked OTA to reduce retry cost
- Use Delta Update to reduce firmware size
- For Cellular MCU:
- Short check-in intervals
- Sleep after download/verification
- Rollout batches reduce network peak load
- CoAP observe or MQTT can balance immediacy vs traffic
flowchart TD
%% Device States
A[Power On / First Boot] --> B[Device Registration]
B --> C{Registration Success?}
C -- Yes --> D[Active / Idle]
C -- No --> E[Retry Registration / Alert]
%% OTA Process
D --> F{Check for OTA?}
F -- No --> D
F -- Yes --> G[Download Firmware Chunked]
G --> H[Verify & Write Flash]
H --> I{Verification Success?}
I -- Yes --> J[Reboot & Activate New FW]
I -- No --> K[Rollback to Previous FW]
K --> D
%% Reporting
J --> L[Report OTA Success to Cloud]
K --> M[Report OTA Failure / Rollback]
%% Device Lifecycle End
D --> N[Decommission / Retire Device]
L --> D
M --> D
%% Styling
style A fill:#fef2c0,stroke:#333,stroke-width:1px
style B fill:#bbf,stroke:#333,stroke-width:1px
style C fill:#f9f,stroke:#333,stroke-width:1px
style D fill:#bfb,stroke:#333,stroke-width:1px
style E fill:#fcc,stroke:#333,stroke-width:1px
style F fill:#ffe0b3,stroke:#333,stroke-width:1px
style G fill:#d0f0fd,stroke:#333,stroke-width:1px
style H fill:#d0f0fd,stroke:#333,stroke-width:1px
style I fill:#f9f,stroke:#333,stroke-width:1px
style J fill:#bbf,stroke:#333,stroke-width:1px
style K fill:#fcc,stroke:#333,stroke-width:1px
style L fill:#bfb,stroke:#333,stroke-width:1px
style M fill:#f99,stroke:#333,stroke-width:1px
style N fill:#ddd,stroke:#333,stroke-width:1px
- Power On / Registration
- Device starts and registers with cloud
- Registration may retry on failure
- Active / Idle
- Device waits for OTA or normal operation
- OTA Process
- MCU checks in with Cloud for available updates
- Downloads firmware in chunks
- Verifies and writes to flash
- Reboots to activate new firmware
- If verification fails → automatic rollback
- Reporting
- Cloud receives OTA success or failure report
- Decommission / Retire
- Device can be retired when no longer used