Skip to content

Instantly share code, notes, and snippets.

@sarum90
Last active January 16, 2026 00:40
Show Gist options
  • Select an option

  • Save sarum90/cb95633750390db690ef701f08aa20ed to your computer and use it in GitHub Desktop.

Select an option

Save sarum90/cb95633750390db690ef701f08aa20ed to your computer and use it in GitHub Desktop.
Demo: object_store EOF retry bug causing data duplication

object_store EOF retry bug causing data duplication

Bug Summary

If a reqwest timeout occurs after consuming all bytes from an S3 response body but before receiving the EOF signal, retry_stream attempts to retry with a Range header like bytes=5-4 (where 5 is the file size). This is an invalid/backwards range.

Per RFC 7233, servers MUST ignore syntactically invalid Range headers. S3 follows this spec and returns 200 OK with the full file instead of 206 Partial Content or 416 Range Not Satisfiable.

Since retry_stream doesn't validate that it received a 206 response, it reads the full file again, causing data duplication.

Testing Notes

  • Requires real AWS S3 - LocalStack returns 416 for invalid ranges (non-compliant with RFC 7233), so it won't reproduce the bug
  • If you have an S3 bucket you can read/write to, test with:
    ./demo_eof_retry_bug.sh your-bucket test-key.txt

Fix

Check if range.start >= range.end before retrying - if we've already read everything, return EOF instead of retrying with an invalid range.

// In retry_stream, before attempting retry:
if range.start >= range.end {
    return Ok(None);  // Already read everything, signal EOF
}
prompt$ bash demo_eof_retry_bug.sh BUCKET test-file.txt
=== EOF Retry Bug Demo ===
Bucket: BUCKET
Region: us-west-2
Uploading test file...
Uploaded 5 bytes ("hello") to s3://BUCKET/test-file.txt
Running rust demo...
--- Reading file with intentional delay ---
Read 5 bytes: "hello"
Sleeping 15s (longer than 10s timeout)...
Continuing to read...
2026-01-16T00:06:26.951292Z INFO object_store::client::get: Encountered error while reading response body: HTTP error: request or response body error. Retrying in 0.1s
Read 5 more bytes: "hello"
--- Result ---
Total bytes read: 10
Content: "hellohello"
BUG: Data was duplicated! Expected 5 bytes, got 10
After timeout, retry_stream sent Range: bytes=5-4 (invalid).
S3 ignored the invalid Range header and returned the full file (200 OK).
Without validation, the full file was appended again.
--- S3 Range Header Behavior ---
Valid range (bytes=0-4):
HTTP/1.1 206 Partial Content
Content-Range: bytes 0-4/5
hello
Range past EOF (bytes=5-10):
HTTP/1.1 416 Requested Range Not Satisfiable
Invalid/backwards range (bytes=5-4) - the buggy retry request:
HTTP/1.1 200 OK
hello
^ S3 returned 200 (not 206), ignoring the invalid Range per RFC 7233.
This causes retry_stream to re-read the entire file.
Cleaning up...
delete: s3://BUCKET/test-file.txt
//! Demo: EOF retry bug causing data duplication
//!
//! Run via: ./demo_eof_retry_bug.sh
//! Or directly: AWS_BUCKET=bucket AWS_REGION=region rust-script demo_eof_retry_bug.rs
//!
//! ```cargo
//! [dependencies]
//! tokio = { version = "1", features = ["full"] }
//! object_store = { version = "0.13", features = ["aws"] }
//! futures = "0.3"
//! bytes = "1"
//! tracing-subscriber = { version = "0.3", features = ["env-filter"] }
//! ```
// Dep for v0.13:
// object_store = { version = "0.13", features = ["aws"] }
// Dep for local path:
// object_store = { path = ".", features = ["aws"] }
use bytes::Bytes;
use futures::StreamExt;
use object_store::aws::AmazonS3Builder;
use object_store::path::Path;
use object_store::{ClientOptions, ObjectStoreExt};
use std::env;
use std::time::Duration;
#[tokio::main]
async fn main() {
// Enable object_store logs to see retry behavior
tracing_subscriber::fmt()
.with_env_filter("object_store=info")
.init();
let bucket = env::var("AWS_BUCKET").expect("AWS_BUCKET required");
let key = env::var("AWS_KEY").expect("AWS_KEY required");
let region = env::var("AWS_REGION").unwrap_or_else(|_| "us-west-2".into());
// Client with 10s timeout - we'll sleep 15s to trigger timeout after reading
let options = ClientOptions::new().with_timeout(Duration::from_secs(10));
let store = AmazonS3Builder::from_env()
.with_bucket_name(&bucket)
.with_region(&region)
.with_client_options(options)
.build()
.expect("Failed to build S3 client");
let path = Path::from(key);
let expected_content = Bytes::from_static(b"hello");
// Get the file as a stream
println!("--- Reading file with intentional delay ---");
let result = store.get(&path).await.expect("Failed to GET");
let mut stream = result.into_stream();
let mut all_bytes = Vec::new();
// Read first chunk (should be all 5 bytes)
if let Some(chunk) = stream.next().await {
let bytes = chunk.expect("Failed to read chunk");
println!(
"Read {} bytes: {:?}",
bytes.len(),
String::from_utf8_lossy(&bytes)
);
all_bytes.extend_from_slice(&bytes);
}
// Sleep longer than timeout - simulates timeout after reading all content but before EOF
println!("Sleeping 15s (longer than 10s timeout)...");
tokio::time::sleep(Duration::from_secs(15)).await;
// Try to read more - this triggers the retry bug
println!("Continuing to read...");
while let Some(chunk) = stream.next().await {
match chunk {
Ok(bytes) => {
println!(
"Read {} more bytes: {:?}",
bytes.len(),
String::from_utf8_lossy(&bytes)
);
all_bytes.extend_from_slice(&bytes);
}
Err(e) => {
println!("Error reading: {e}");
break;
}
}
}
println!("\n--- Result ---");
println!("Total bytes read: {}", all_bytes.len());
println!("Content: {:?}", String::from_utf8_lossy(&all_bytes));
if all_bytes.len() > expected_content.len() {
println!(
"\nBUG: Data was duplicated! Expected {} bytes, got {}",
expected_content.len(),
all_bytes.len()
);
println!(
"After timeout, retry_stream sent Range: bytes={}-{} (invalid).",
expected_content.len(),
expected_content.len() - 1
);
println!("S3 ignored the invalid Range header and returned the full file (200 OK).");
println!("Without validation, the full file was appended again.");
} else if all_bytes.len() == expected_content.len() {
println!("\nOK: Correct number of bytes received");
} else {
println!("\nWARNING: Received fewer bytes than expected");
}
}
#!/bin/bash
# Demo: EOF retry bug causing data duplication in object_store
#
# Usage: ./demo_eof_retry_bug.sh <bucket> <key>
#
# Requires: aws cli, cargo +nightly (or rust-script)
set -e
if [ $# -lt 2 ]; then
echo "Usage: ./demo_eof_retry_bug.sh <bucket> <key>"
echo "Example: ./demo_eof_retry_bug.sh my-bucket test/eof-bug.txt"
exit 1
fi
BUCKET="$1"
KEY="$2"
REGION="${AWS_REGION:-us-west-2}"
echo "=== EOF Retry Bug Demo ==="
echo "Bucket: $BUCKET"
echo "Region: $REGION"
echo ""
# Upload test file
echo "Uploading test file..."
echo -n "hello" | aws s3 cp - "s3://$BUCKET/$KEY" --region "$REGION"
echo "Uploaded 5 bytes (\"hello\") to s3://$BUCKET/$KEY"
echo ""
# Run the rust demo with credentials
echo "Running rust demo..."
eval "$(aws configure export-credentials --format env)"
export AWS_BUCKET="$BUCKET"
export AWS_KEY="$KEY"
export AWS_REGION="$REGION"
# Try cargo +nightly first, fall back to rust-script
if cargo +nightly -Zscript demo_eof_retry_bug.rs 2>/dev/null; then
:
elif command -v rust-script &>/dev/null; then
rust-script demo_eof_retry_bug.rs
else
echo "Error: Need 'cargo +nightly' or 'rust-script' installed"
exit 1
fi
# Demo S3's behavior with various Range headers
echo ""
echo "--- S3 Range Header Behavior ---"
PRESIGNED=$(aws s3 presign "s3://$BUCKET/$KEY" --region "$REGION")
echo ""
echo "Valid range (bytes=0-4):"
curl -s -D - -H "Range: bytes=0-4" "$PRESIGNED" | grep -E "^(HTTP|Content-Range|hello)"
echo ""
echo "Range past EOF (bytes=5-10):"
curl -s -D - -H "Range: bytes=5-10" "$PRESIGNED" | grep -E "^(HTTP|Content-Range|InvalidRange)" | head -3
echo ""
echo "Invalid/backwards range (bytes=5-4) - the buggy retry request:"
curl -s -D - -H "Range: bytes=5-4" "$PRESIGNED" | grep -E "^(HTTP|Content-Range|hello)"
echo ""
echo "^ S3 returned 200 (not 206), ignoring the invalid Range per RFC 7233."
echo " This causes retry_stream to re-read the entire file."
# Cleanup
echo ""
echo "Cleaning up..."
aws s3 rm "s3://$BUCKET/$KEY" --region "$REGION"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment