Skip to content

Instantly share code, notes, and snippets.

@FNGarvin
Last active January 12, 2026 18:10
Show Gist options
  • Select an option

  • Save FNGarvin/2a57b59d1189e795a5a44d5ac35c5ab6 to your computer and use it in GitHub Desktop.

Select an option

Save FNGarvin/2a57b59d1189e795a5a44d5ac35c5ab6 to your computer and use it in GitHub Desktop.
HOWTO: Download a Single File From a Docker Image

🛠️ HOWTO: Download a Single File From a Docker Image 🛠️

Quickstart:

Skip to the addendum for an automated script.

1. Set IMAGE and Collect Metadata

Define your image and use skopeo to pull the build history and the layer manifest.

IMAGE="docker://SOME_USERNAME/SOME_REPONAME[:OPTIONAL_TAG]"

skopeo inspect --config "$IMAGE" > history.json
skopeo inspect "$IMAGE" > metadata.json

2. Map History to Layer Digests

Run this jq command to see a side-by-side mapping of every command to its corresponding SHA256 layer digest. Identify the idx and sha for the instruction that added your target file (e.g., COPY start.sh).

jq -n --slurpfile h history.json --slurpfile m metadata.json '($h[0].history|map(select(.empty_layer|not))) as $h|$m[0].Layers as $l|[range(0;$l|length)|{idx:.,cmd:$h[.].created_by,sha:$l[.]}]'

3. Surgical Extraction

Update the LAYER_SHA and FILE_TO_EXTRACT variables based on your findings in Step 2. Then, run the following commands to pull the authentication token and download only that specific layer fragment to extract your file.

LAYER_SHA="sha256:PASTE_SHA_FROM_STEP_2_HERE"
FILE_TO_EXTRACT="start.sh"

TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:$(R="${IMAGE#docker://}"; echo "${R%:*}"):pull" | jq -r .token)
curl -L -H "Authorization: Bearer $TOKEN" "https://registry-1.docker.io/v2/$(R="${IMAGE#docker://}"; echo "${R%:*}")/blobs/$LAYER_SHA" -o fragment.tar.gz
tar -zxvf fragment.tar.gz "$FILE_TO_EXTRACT"

Addendum:

Attached is a Ruby script that automates the procedure and wraps it in a convenient NCurses interface. It expects you to have libncurses-dev, skpoeo, curl, jq, and tar installed and you'll also need the Ruby curses library. All very easy to setup - get with Gemini or similar for assistance if necessary.

#!/usr/bin/env ruby
#
# container_extract.rb
#
# Copyright (c) FNGarvin 2026
#
# A TUI tool to surgically extract files from Docker image layers
# without pulling the entire image.
#
# Usage:
# ./container_extract.rb <image_name>
# ./container_extract.rb docker://alpine:latest
#
# Dependencies:
# [apt] install libncurses-dev skopeo curl jq tar
# gem install curses
#
require 'json'
require 'curses'
require 'shellwords'
require 'fileutils'
require 'open3'
require 'uri'
# -----------------------------------------------------------------------------
# Configuration & Helpers
# -----------------------------------------------------------------------------
def error_exit(msg)
Curses.close_screen if Curses.stdscr
puts "\e[31m[ERROR]\e[0m #{msg}"
exit 1
end
def check_dependencies
['skopeo', 'curl', 'tar'].each do |cmd|
system("which #{cmd} > /dev/null 2>&1") || error_exit("Missing dependency: #{cmd}")
end
end
def normalize_image_input(input)
# Basic prefix check to help skopeo out, but we rely on Skopeo for the final normalization
return input if input.start_with?('docker://')
"docker://#{input}"
end
# -----------------------------------------------------------------------------
# Core Logic
# -----------------------------------------------------------------------------
def fetch_metadata(image)
puts "Fetching metadata for #{image}..."
config_json = nil
manifest_json = nil
# 1. Get Config (History)
Open3.popen3("skopeo", "inspect", "--config", image) do |stdin, stdout, stderr, wait_thr|
config_json = stdout.read
unless wait_thr.value.success?
error_exit "Failed to fetch config. Check image name or permissions.\n#{stderr.read}"
end
end
# 2. Get Manifest (Layers & Canonical Name)
Open3.popen3("skopeo", "inspect", image) do |stdin, stdout, stderr, wait_thr|
manifest_json = stdout.read
unless wait_thr.value.success?
error_exit "Failed to fetch manifest.\n#{stderr.read}"
end
end
begin
config_data = JSON.parse(config_json)
manifest_data = JSON.parse(manifest_json)
rescue JSON::ParserError
error_exit "Failed to parse JSON output from skopeo."
end
# Capture the Canonical Name (e.g., 'alpine' -> 'docker.io/library/alpine:latest')
canonical_name = manifest_data['Name']
# 3. Map History to Layers
history_items = config_data['history'].reject { |h| h['empty_layer'] }
layers = manifest_data['Layers']
if history_items.length != layers.length
error_exit "Mismatch between history entries (#{history_items.length}) and layers (#{layers.length})."
end
all_layers = history_items.zip(layers).map.with_index do |(h, l), idx|
{
index: idx,
command: h['created_by'].to_s.sub('/bin/sh -c #(nop) ', '').strip,
digest: l
}
end
filtered_layers = all_layers.select do |layer|
layer[:command] =~ /^(COPY|ADD)/
end
return filtered_layers, canonical_name
end
# -----------------------------------------------------------------------------
# UI Logic
# -----------------------------------------------------------------------------
def truncate(str, len)
return str if str.length <= len
str[0, len - 3] + "..."
end
def run_tui(layers)
Curses.init_screen
Curses.start_color
Curses.curs_set(0) # Hide cursor
Curses.noecho
Curses.init_pair(1, Curses::COLOR_WHITE, Curses::COLOR_BLUE) # Selected
Curses.init_pair(2, Curses::COLOR_WHITE, Curses::COLOR_BLACK) # Normal
Curses.init_pair(3, Curses::COLOR_GREEN, Curses::COLOR_BLACK) # Header
win = Curses.stdscr
win.keypad = true
selected_idx = 0
scroll_offset = 0
max_y = win.maxy
max_x = win.maxx
loop do
win.clear
win.attron(Curses.color_pair(3) | Curses::A_BOLD)
win.setpos(0, 0)
header = "IDX | %-15s | COMMAND" % ["DIGEST"]
win.addstr(header)
win.attroff(Curses.color_pair(3) | Curses::A_BOLD)
list_height = max_y - 2
viewable_layers = layers[scroll_offset, list_height] || []
viewable_layers.each_with_index do |layer, i|
row = i + 1
is_selected = (scroll_offset + i) == selected_idx
attr = is_selected ? Curses.color_pair(1) : Curses.color_pair(2)
win.attron(attr)
line_str = "%-3d | %-15s | %s" % [
layer[:index],
layer[:digest][7..18],
layer[:command].gsub("\t", " ")
]
line_str = line_str.ljust(max_x)
win.setpos(row, 0)
win.addstr(truncate(line_str, max_x))
win.attroff(attr)
end
win.setpos(max_y - 1, 0)
win.addstr("UP/DOWN: Navigate | ENTER: Select | Q: Quit")
win.refresh
case win.getch
when Curses::Key::UP
selected_idx -= 1 if selected_idx > 0
scroll_offset -= 1 if selected_idx < scroll_offset
when Curses::Key::DOWN
selected_idx += 1 if selected_idx < layers.length - 1
scroll_offset += 1 if selected_idx >= scroll_offset + list_height
when 10, 13 # Enter
return layers[selected_idx]
when 'q', 'Q'
return nil
end
end
ensure
Curses.close_screen
end
def infer_filename(command)
if command =~ /^(COPY|ADD)\s+(?:--\w+=\w+\s+)*["']?([^"'\s]+)["']?\s+["']?([^"'\s]+)["']?/
source = $2
dest = $3
if dest.end_with?('/')
path = dest + File.basename(source)
else
path = dest
end
return path.sub(/^\//, '')
end
""
end
# -----------------------------------------------------------------------------
# Extraction Logic
# -----------------------------------------------------------------------------
def extract_file(canonical_name, layer_digest, filename)
# canonical_name is trustworthy (from Skopeo). e.g., "docker.io/library/alpine:latest"
# We just need to strip the "docker.io/" prefix for the Auth Scope.
# Remove protocol if present (though Name usually doesn't have it)
clean_name = canonical_name.sub('docker://', '')
# Remove docker.io prefix for token scope
# This handles: "docker.io/library/alpine" -> "library/alpine"
repo_scoped = clean_name.sub(/^docker\.io\//, '')
# Strip tag or digest to get the repository name
# We use rpartition to handle ports correctly (e.g. localhost:5000/image:tag)
if repo_scoped.include?('@')
repo_name = repo_scoped.split('@').first
elsif repo_scoped.include?(':')
# If the last colon is followed by a tag (no slashes), strip it.
# Otherwise, it might be a port number (host:5000/image).
# Heuristic: Tags usually don't have slashes.
base, sep, tag = repo_scoped.rpartition(':')
if tag.include?('/')
repo_name = repo_scoped # Colon was likely a port
else
repo_name = base
end
else
repo_name = repo_scoped
end
# 1. Get Auth Token
puts "Authenticating scope: #{repo_name}..."
scope = "repository:#{repo_name}:pull"
token_url = "https://auth.docker.io/token?service=registry.docker.io&scope=#{scope}"
token_json = `curl -s "#{token_url}"`
token = JSON.parse(token_json)['token']
unless token
error_exit "Failed to obtain auth token. Response: #{token_json}"
end
# 2. Download Layer
blob_url = "https://registry-1.docker.io/v2/#{repo_name}/blobs/#{layer_digest}"
puts "Downloading layer #{layer_digest[0..12]}..."
layer_file = "layer_#{layer_digest[7..15]}.tar.gz"
unless system("curl -L -H \"Authorization: Bearer #{token}\" \"#{blob_url}\" -o #{layer_file}")
error_exit "Download failed."
end
# 3. Extract File (or Keep Archive)
if filename.empty?
puts "\n\e[32m[SUCCESS]\e[0m Download complete. Saved to ./#{layer_file}"
else
puts "Extracting '#{filename}'..."
if system("tar -zxvf #{layer_file} \"#{filename}\"")
puts "\n\e[32m[SUCCESS]\e[0m Extracted to ./#{filename}"
File.delete(layer_file) if File.exist?(layer_file)
else
puts "\n\e[31m[ERROR]\e[0m Extraction failed (file not found or tar error)."
puts " Keeping layer archive: #{layer_file}"
end
end
end
# -----------------------------------------------------------------------------
# Main
# -----------------------------------------------------------------------------
check_dependencies
if ARGV.empty?
puts "Usage: #{$0} <image_name>"
exit 1
end
input_image = normalize_image_input(ARGV[0])
# Fetch Data & Canonical Name
layers, canonical_name = fetch_metadata(input_image)
if layers.empty?
error_exit "No COPY or ADD layers found in this image."
end
# Run TUI
selected_layer = run_tui(layers)
unless selected_layer
puts "Aborted by user."
exit 0
end
puts "Selected Layer: #{selected_layer[:digest]}"
puts "Command: #{selected_layer[:command]}"
puts "---------------------------------------------------"
default_file = infer_filename(selected_layer[:command])
print "Enter file path to extract [#{default_file}]: "
input = STDIN.gets.chomp
# If input is empty, use default. If user wants full tar, they must clear the default?
# Re-reading prompt: "Name of file to extract (or blank for entire tarball): [default name]"
# If the user hits ENTER, they usually expect the default.
# To get the tarball, they should probably type a specific keyword or we change the UI hint.
# Let's trust your previous instruction: "default should be exactly as it is now."
# "To get the whole tarball, the user would have to enter garbage... or delete the default"
filename = input.empty? ? default_file : input
# Special case: If the user explicitly cleared the input (entered nothing when no default exists,
# or somehow indicated "no extract"), we support that?
# Currently, if inferred is empty and input is empty -> Error.
# If inferred is "foo" and input is empty -> filename="foo".
# To get tarball, user must type a non-existent file? Yes, per your previous request.
if filename.empty?
# If we couldn't infer a file and user typed nothing, we download the tarball.
# This covers the "manual mode" where heuristic fails.
puts "No filename provided. Downloading full layer archive."
end
extract_file(canonical_name, selected_layer[:digest], filename)
#EOF container_extract.rb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment