This document details the process SABnzbd uses to verify the integrity of downloaded files using PAR2 and repair them if necessary. PAR2 (Parchive 2) is crucial for Usenet downloads as it allows reconstruction of missing or corrupted data using redundancy files.
SABnzbd employs a multi-stage process involving internal checks and an external PAR2 command-line utility.
Before verification can begin, SABnzbd needs to understand the structure of the downloaded set according to the PAR2 index file (usually setname.par2). The function parse_par2_file handles this.
Core Logic:
- Reads the main
.par2file. - Identifies packets starting with
PAR2\x00PKT(PAR_PKT_ID). - Verifies the MD5 checksum of each packet's data.
- Processes packets based on their type ID.
Key Packet Types:
-
File Description (
PAR_FILE_ID): Describes an original file.# sabnzbd/par2file.py (lines 156-172 approx.) elif par2_packet_type == PAR_FILE_ID: # Packet Structure: TypeID, FileID, FullHash, 16kHash, FileLength, Name fileid = data[32:48].hex() if filepar2info.get(fileid): # Already have data for this file ID from another packet continue hash16k = data[64:80] # MD5 hash of the first 16KB filesize = struct.unpack("<Q", data[80:88])[0] # Expected size (8 bytes) filename = correct_unknown_encoding(data[88:].strip(b"\0")) # Original filename # Store info in a dataclass instance, keyed by fileid filepar2info[fileid] = FilePar2Info(filename, hash16k, filesize)
This extracts the original filename, expected size, and the MD5 hash of the first 16KB (used for identifying obfuscated files).
-
Main Packet (
PAR_MAIN_ID): Provides set-wide information.# sabnzbd/par2file.py (lines 181-188 approx.) elif par2_packet_type == PAR_MAIN_ID: # Packet Structure: TypeID, SliceSize, NumFiles, ... slice_size = struct.unpack("<Q", data[32:40])[0] # Size of recovery blocks # Pre-calculate coefficient for efficient CRC combining coeff = sabctools.crc32_xpow8n(slice_size) nr_files = struct.unpack("<I", data[40:44])[0] # Total files in set
slice_sizeis essential for reconstructing the full file hash. -
Slice Checksum (
PAR_SLICE_ID): Contains CRC32 checksums for each block (slice) of a file.# sabnzbd/par2file.py (lines 189-195 approx.) elif par2_packet_type == PAR_SLICE_ID: # Packet Structure: TypeID, FileID, [SliceHash, SliceCRC32]... fileid = data[32:48].hex() # ID of the file these checksums belong to if not filecrc32.get(fileid): filecrc32[fileid] = [] # Loop through checksum entries (20 bytes each) for i in range(48, pack_len - 32, 20): # Extract the 4-byte CRC32 value for this slice filecrc32[fileid].append(struct.unpack("<I", data[i + 16 : i + 20])[0])
Reconstructing Full File Hash:
After parsing, SABnzbd calculates the expected CRC32 hash for each entire file using the slice_size and the collected slice CRC32s (filecrc32). This avoids reading the whole file for verification later. It uses efficient C functions from sabctools.
# sabnzbd/par2file.py (lines 203-223 approx.)
for fileid in filepar2info.keys():
par2info = filepar2info[fileid]
# Sanity check if essential info is present
if not filecrc32.get(fileid) or not nr_files or not slice_size:
logging.debug("Missing essential information for %s", par2info)
continue
slices = par2info.filesize // slice_size # Number of full slices
slice_nr = 0
crc32 = 0 # Initialize combined CRC
# Combine CRC32s of full slices
while slice_nr < slices:
crc32 = sabctools.crc32_multiply(crc32, coeff) ^ filecrc32[fileid][slice_nr]
slice_nr += 1
# Handle the last partial slice if the file size isn't an exact multiple
if tail_size := par2info.filesize % slice_size:
# Adjust the last slice's CRC for its actual size and combine it
crc32 = sabctools.crc32_combine(
crc32, sabctools.crc32_zero_unpad(filecrc32[fileid][-1], slice_size - tail_size), tail_size
)
# Store the final calculated full file hash
par2info.filehash = crc32
table[par2info.filename] = par2info # Add to result table (nzo.par2packs)The resulting par2info.filehash and par2info.filesize are stored in nzo.par2packs and used in the next phase.
Function: quick_check_set(setname, nzo)
Purpose: A fast, internal check to verify files using CRC32 hashes calculated during download/assembly and file sizes, comparing them against the expected values parsed in Phase 1. This avoids calling the slower external par2 tool if possible.
Process:
- Retrieves the parsed PAR2 data (
par2pack = nzo.par2packs.get(setname)). - Iterates through each
filedescribed inpar2pack. - Searches the list of actual downloaded files (
nzf_list = nzo.finished_files) for a match. - Matching & Verification:
- By Filename: If
file == nzf.filename, it checks ifnzf.crc32 == par2info.filehashand the actual file size matchespar2info.filesize. - By Hash & Size (Obfuscation): If no filename match, it checks if
nzf.crc32 == par2info.filehashand size matchespar2info.filesizefor any unused downloaded file (nzf). If a match is found:- The downloaded file (
nzf.filename) is renamed to the correct name (file) using therenamerfunction. - Internal records (
nzo.renamed_file,nzf.filename) are updated.
- The downloaded file (
- By Filename: If
- Outcome: If all files listed in the PAR2 index are found (either by name or hash/size) and their CRC32/size match the expected values (or the file type is configured to be ignored), the function returns
True, and the external PAR2 process is skipped. If any file is missing or fails the check, it returnsFalse.
Code Snippet (Simplified Comparison):
# sabnzbd/newsunpack.py (lines 1530-1581 approx.)
result = True
renames = {}
found_paths: Set[str] = set()
ignore_ext = cfg.quick_check_ext_ignore()
for file in par2pack: # Expected files
par2info = par2pack[file]
found = False
file_to_ignore = get_ext(file).replace(".", "") in ignore_ext
for nzf in nzf_list: # Actual downloaded files
# Check 1: Filename Match
if file == nzf.filename:
found = True
found_paths.add(nzf.filepath)
# Check 2: CRC32 & Size Match
if (nzf.crc32 is not None and
nzf.crc32 == par2info.filehash and
is_size(nzf.filepath, par2info.filesize)):
logging.debug("Quick-check of file %s OK", file)
result &= True # OK
elif file_to_ignore:
logging.debug("Quick-check ignoring file %s", file)
result &= True # Ignored is OK
else:
logging.info("Quick-check of file %s failed!", file)
result = False # Mismatch
break # Found by name
# Check 3: Obfuscation Match (Hash & Size only)
elif (nzf.filepath not in found_paths and # Not already matched
nzf.crc32 is not None and
nzf.crc32 == par2info.filehash and
is_size(nzf.filepath, par2info.filesize)):
try:
logging.debug("Quick-check will rename %s to %s", nzf.filename, file)
# Note: file can and is allowed to be in a subdirectory.
# Subdirectories in par2 always contain "/", not "\" so we need to normalize
normalized_file = os.path.normpath(file)
renamer(os.path.join(nzo.download_path, nzf.filename), os.path.join(nzo.download_path, normalized_file), create_local_directories=True) # Rename obfuscated file
renames[normalized_file] = nzf.filename
nzf.filename = normalized_file
result &= True # OK after rename
found = True
found_paths.add(nzf.filepath)
except IOError:
# Renamed failed for some reason, probably already done
pass
break # Found by hash/size
if not found:
if file_to_ignore:
logging.debug("Quick-check ignoring missing file %s", file)
continue # Ignored missing file is OK
logging.info("Cannot Quick-check missing file %s!", file)
result = False # File missing
# Save renames if any occurred
if renames:
nzo.renamed_file(renames)
# Return final result (True if all OK/renamed/ignored, False otherwise)Functions: par2_repair(nzo, setname) and par2cmdline_verify(parfile, nzo, setname, joinables)
Trigger: Runs if quick_check_set returned False or was disabled.
Process:
-
Command Construction (
par2cmdline_verify):- Locates the PAR2 executable (
PAR2_COMMAND). - Builds the command:
par2 r [options] <main_par2_file> <wildcard>*r: Verify and Repair mode.[options]: User-defined options from SABnzbd's config (cfg.par_option()).<main_par2_file>: Path to thesetname.par2file.<wildcard>*: File pattern matching the set (e.g.,setname*) to tell the tool which files to check/use.
- Automatically adds compatibility flags (
-N,-B <path>) based on detectedpar2cmdlineversion quirks. - Executes the command using
build_and_run_command.
- Locates the PAR2 executable (
-
Output Parsing (
par2cmdline_verify):- Reads the command's standard output line by line.
- Uses string matching and regular expressions to interpret the PAR2 tool's progress and results.
- Key Output Strings/Patterns:
"All files are correct": Verification successful. Setsfinished = True."Repair is required": Verification failed, repair needed."You need X recovery blocks": Repair needed, but not enough blocks found currently on disk."Repair is possible": Enough blocks found, repair starting."Repairing: X%"/"Processing: X%": Repair/joining progress update."Repair complete": Repair successful. Setsfinished = True."Repair Failed.": Repair unsuccessful (unknown reason)."There is not enough space on the disk": Disk full error.File: "obfuscated_name" - is a match for "original_name": PAR2 tool identified an obfuscated file (recorded inrenames).File: "original_name" - found X of Y data blocks from "source_file": PAR2 tool usedsource_file(potentially obfuscated) to reconstructoriginal_name.
-
Handling Insufficient Blocks:
- If the output indicates
"You need X recovery blocks", SABnzbd checks its list of known PAR2 files for this set (nzo.extrapars). - It calls
nzo.get_extra_blocks(setname, needed_blocks)to see if downloading previously skipped.vol*.par2files would provide enough blocks. - If yes:
par2cmdline_verifysetsreadd = True.par2_repairreturns(True, False).- The main SABnzbd loop pauses post-processing, downloads the required PAR2 files, and then calls
par2_repairagain to retry the external command.
- If no (not enough blocks exist even in undownloaded files): The job fails with "Repair failed, not enough repair blocks".
- If the output indicates
-
Outcome:
par2cmdline_verifyreturnsfinished(True/False),readd(True/False), and lists of files used/renamed.
Function: par2_repair(nzo, setname) (final part)
Trigger: Runs only if the external PAR2 process completed successfully (finished = True) and the setting Enable PAR2 cleanup (cfg.enable_par_cleanup()) is active.
Process:
- Identifies files to delete:
- The main
.par2file used for the check (parfile). - The base
setname.par2file. - All associated
.vol*.par2files for the set (setpars). - Any temporary backup files (e.g.,
filename.rar.1) created by the PAR2 tool. - Any source files explicitly used by PAR2 for reconstruction (
used_for_repair).
- The main
- Iterates through the
deletableslist and attempts to remove each file usingremove_file.
Code Snippet (Identifying Deletables):
# sabnzbd/newsunpack.py (lines 1080-1116 approx.)
if cfg.enable_par_cleanup():
deletables = []
new_dir_content = os.listdir(nzo.download_path)
# Add .1 backup files created during repair
for path in new_dir_content:
if get_ext(path) == ".1" and path not in old_dir_content:
deletables.append(os.path.join(nzo.download_path, path))
deletables.append(parfile) # Main par2 used
deletables.append(os.path.join(nzo.download_path, setname + ".par2")) # Base par2
deletables.append(os.path.join(nzo.download_path, setname + ".PAR2")) # Case variation
# Add source files used by par2 for reconstruction
deletables.extend([os.path.join(nzo.download_path, f) for f in used_for_repair])
# Add all PAR2 volume files for this set
deletables.extend([os.path.join(nzo.download_path, nzf.filename) for nzf in setpars])
# Delete the files
for filepath in deletables:
if os.path.exists(filepath):
try:
remove_file(filepath)
except OSError:
logging.warning("Deleting %s failed!", filepath)SABnzbd implements a robust, multi-stage PAR2 verification and repair process. It prioritizes speed with an internal Quick Check using CRC32/size data gathered during download. When necessary, it intelligently manages an external PAR2 utility, parsing its output to track progress, handle block requirements (triggering downloads if needed), and determine success or failure, followed by optional cleanup of the PAR2 files. This ensures data integrity while optimizing for speed.