Skip to content

Instantly share code, notes, and snippets.

@ShivamB25
Created April 18, 2025 12:43
Show Gist options
  • Select an option

  • Save ShivamB25/77fb423017ac18be264b7e4c86e40c4c to your computer and use it in GitHub Desktop.

Select an option

Save ShivamB25/77fb423017ac18be264b7e4c86e40c4c to your computer and use it in GitHub Desktop.

SABnzbd PAR2 Verification and Repair Explained

This document details the process SABnzbd uses to verify the integrity of downloaded files using PAR2 and repair them if necessary. PAR2 (Parchive 2) is crucial for Usenet downloads as it allows reconstruction of missing or corrupted data using redundancy files.

SABnzbd employs a multi-stage process involving internal checks and an external PAR2 command-line utility.

Phase 1: Parsing PAR2 Index Files (sabnzbd/par2file.py)

Before verification can begin, SABnzbd needs to understand the structure of the downloaded set according to the PAR2 index file (usually setname.par2). The function parse_par2_file handles this.

Core Logic:

  • Reads the main .par2 file.
  • Identifies packets starting with PAR2\x00PKT (PAR_PKT_ID).
  • Verifies the MD5 checksum of each packet's data.
  • Processes packets based on their type ID.

Key Packet Types:

  1. File Description (PAR_FILE_ID): Describes an original file.

    # sabnzbd/par2file.py (lines 156-172 approx.)
    elif par2_packet_type == PAR_FILE_ID:
        # Packet Structure: TypeID, FileID, FullHash, 16kHash, FileLength, Name
        fileid = data[32:48].hex()
        if filepar2info.get(fileid):
             # Already have data for this file ID from another packet
             continue
    
        hash16k = data[64:80] # MD5 hash of the first 16KB
        filesize = struct.unpack("<Q", data[80:88])[0] # Expected size (8 bytes)
        filename = correct_unknown_encoding(data[88:].strip(b"\0")) # Original filename
    
        # Store info in a dataclass instance, keyed by fileid
        filepar2info[fileid] = FilePar2Info(filename, hash16k, filesize)

    This extracts the original filename, expected size, and the MD5 hash of the first 16KB (used for identifying obfuscated files).

  2. Main Packet (PAR_MAIN_ID): Provides set-wide information.

    # sabnzbd/par2file.py (lines 181-188 approx.)
    elif par2_packet_type == PAR_MAIN_ID:
        # Packet Structure: TypeID, SliceSize, NumFiles, ...
        slice_size = struct.unpack("<Q", data[32:40])[0] # Size of recovery blocks
        # Pre-calculate coefficient for efficient CRC combining
        coeff = sabctools.crc32_xpow8n(slice_size)
        nr_files = struct.unpack("<I", data[40:44])[0] # Total files in set

    slice_size is essential for reconstructing the full file hash.

  3. Slice Checksum (PAR_SLICE_ID): Contains CRC32 checksums for each block (slice) of a file.

    # sabnzbd/par2file.py (lines 189-195 approx.)
    elif par2_packet_type == PAR_SLICE_ID:
        # Packet Structure: TypeID, FileID, [SliceHash, SliceCRC32]...
        fileid = data[32:48].hex() # ID of the file these checksums belong to
        if not filecrc32.get(fileid):
            filecrc32[fileid] = []
            # Loop through checksum entries (20 bytes each)
            for i in range(48, pack_len - 32, 20):
                # Extract the 4-byte CRC32 value for this slice
                filecrc32[fileid].append(struct.unpack("<I", data[i + 16 : i + 20])[0])

Reconstructing Full File Hash: After parsing, SABnzbd calculates the expected CRC32 hash for each entire file using the slice_size and the collected slice CRC32s (filecrc32). This avoids reading the whole file for verification later. It uses efficient C functions from sabctools.

# sabnzbd/par2file.py (lines 203-223 approx.)
for fileid in filepar2info.keys():
    par2info = filepar2info[fileid]
    # Sanity check if essential info is present
    if not filecrc32.get(fileid) or not nr_files or not slice_size:
        logging.debug("Missing essential information for %s", par2info)
        continue

    slices = par2info.filesize // slice_size # Number of full slices
    slice_nr = 0
    crc32 = 0 # Initialize combined CRC
    # Combine CRC32s of full slices
    while slice_nr < slices:
        crc32 = sabctools.crc32_multiply(crc32, coeff) ^ filecrc32[fileid][slice_nr]
        slice_nr += 1

    # Handle the last partial slice if the file size isn't an exact multiple
    if tail_size := par2info.filesize % slice_size:
        # Adjust the last slice's CRC for its actual size and combine it
        crc32 = sabctools.crc32_combine(
            crc32, sabctools.crc32_zero_unpad(filecrc32[fileid][-1], slice_size - tail_size), tail_size
        )
    # Store the final calculated full file hash
    par2info.filehash = crc32
    table[par2info.filename] = par2info # Add to result table (nzo.par2packs)

The resulting par2info.filehash and par2info.filesize are stored in nzo.par2packs and used in the next phase.

Phase 2: Quick Check (Internal Verification) (sabnzbd/newsunpack.py)

Function: quick_check_set(setname, nzo)

Purpose: A fast, internal check to verify files using CRC32 hashes calculated during download/assembly and file sizes, comparing them against the expected values parsed in Phase 1. This avoids calling the slower external par2 tool if possible.

Process:

  1. Retrieves the parsed PAR2 data (par2pack = nzo.par2packs.get(setname)).
  2. Iterates through each file described in par2pack.
  3. Searches the list of actual downloaded files (nzf_list = nzo.finished_files) for a match.
  4. Matching & Verification:
    • By Filename: If file == nzf.filename, it checks if nzf.crc32 == par2info.filehash and the actual file size matches par2info.filesize.
    • By Hash & Size (Obfuscation): If no filename match, it checks if nzf.crc32 == par2info.filehash and size matches par2info.filesize for any unused downloaded file (nzf). If a match is found:
      • The downloaded file (nzf.filename) is renamed to the correct name (file) using the renamer function.
      • Internal records (nzo.renamed_file, nzf.filename) are updated.
  5. Outcome: If all files listed in the PAR2 index are found (either by name or hash/size) and their CRC32/size match the expected values (or the file type is configured to be ignored), the function returns True, and the external PAR2 process is skipped. If any file is missing or fails the check, it returns False.

Code Snippet (Simplified Comparison):

# sabnzbd/newsunpack.py (lines 1530-1581 approx.)
result = True
renames = {}
found_paths: Set[str] = set()
ignore_ext = cfg.quick_check_ext_ignore()

for file in par2pack: # Expected files
    par2info = par2pack[file]
    found = False
    file_to_ignore = get_ext(file).replace(".", "") in ignore_ext

    for nzf in nzf_list: # Actual downloaded files
        # Check 1: Filename Match
        if file == nzf.filename:
            found = True
            found_paths.add(nzf.filepath)
            # Check 2: CRC32 & Size Match
            if (nzf.crc32 is not None and
                nzf.crc32 == par2info.filehash and
                is_size(nzf.filepath, par2info.filesize)):
                logging.debug("Quick-check of file %s OK", file)
                result &= True # OK
            elif file_to_ignore:
                 logging.debug("Quick-check ignoring file %s", file)
                 result &= True # Ignored is OK
            else:
                logging.info("Quick-check of file %s failed!", file)
                result = False # Mismatch
            break # Found by name

        # Check 3: Obfuscation Match (Hash & Size only)
        elif (nzf.filepath not in found_paths and # Not already matched
              nzf.crc32 is not None and
              nzf.crc32 == par2info.filehash and
              is_size(nzf.filepath, par2info.filesize)):
            try:
                logging.debug("Quick-check will rename %s to %s", nzf.filename, file)
                # Note: file can and is allowed to be in a subdirectory.
                # Subdirectories in par2 always contain "/", not "\" so we need to normalize
                normalized_file = os.path.normpath(file)
                renamer(os.path.join(nzo.download_path, nzf.filename), os.path.join(nzo.download_path, normalized_file), create_local_directories=True) # Rename obfuscated file
                renames[normalized_file] = nzf.filename
                nzf.filename = normalized_file
                result &= True # OK after rename
                found = True
                found_paths.add(nzf.filepath)
            except IOError:
                 # Renamed failed for some reason, probably already done
                 pass
            break # Found by hash/size

    if not found:
        if file_to_ignore:
            logging.debug("Quick-check ignoring missing file %s", file)
            continue # Ignored missing file is OK
        logging.info("Cannot Quick-check missing file %s!", file)
        result = False # File missing

# Save renames if any occurred
if renames:
    nzo.renamed_file(renames)
# Return final result (True if all OK/renamed/ignored, False otherwise)

Phase 3: External PAR2 Verification & Repair (sabnzbd/newsunpack.py)

Functions: par2_repair(nzo, setname) and par2cmdline_verify(parfile, nzo, setname, joinables)

Trigger: Runs if quick_check_set returned False or was disabled.

Process:

  1. Command Construction (par2cmdline_verify):

    • Locates the PAR2 executable (PAR2_COMMAND).
    • Builds the command: par2 r [options] <main_par2_file> <wildcard>*
      • r: Verify and Repair mode.
      • [options]: User-defined options from SABnzbd's config (cfg.par_option()).
      • <main_par2_file>: Path to the setname.par2 file.
      • <wildcard>*: File pattern matching the set (e.g., setname*) to tell the tool which files to check/use.
    • Automatically adds compatibility flags (-N, -B <path>) based on detected par2cmdline version quirks.
    • Executes the command using build_and_run_command.
  2. Output Parsing (par2cmdline_verify):

    • Reads the command's standard output line by line.
    • Uses string matching and regular expressions to interpret the PAR2 tool's progress and results.
    • Key Output Strings/Patterns:
      • "All files are correct": Verification successful. Sets finished = True.
      • "Repair is required": Verification failed, repair needed.
      • "You need X recovery blocks": Repair needed, but not enough blocks found currently on disk.
      • "Repair is possible": Enough blocks found, repair starting.
      • "Repairing: X%" / "Processing: X%": Repair/joining progress update.
      • "Repair complete": Repair successful. Sets finished = True.
      • "Repair Failed.": Repair unsuccessful (unknown reason).
      • "There is not enough space on the disk": Disk full error.
      • File: "obfuscated_name" - is a match for "original_name": PAR2 tool identified an obfuscated file (recorded in renames).
      • File: "original_name" - found X of Y data blocks from "source_file": PAR2 tool used source_file (potentially obfuscated) to reconstruct original_name.
  3. Handling Insufficient Blocks:

    • If the output indicates "You need X recovery blocks", SABnzbd checks its list of known PAR2 files for this set (nzo.extrapars).
    • It calls nzo.get_extra_blocks(setname, needed_blocks) to see if downloading previously skipped .vol*.par2 files would provide enough blocks.
    • If yes:
      • par2cmdline_verify sets readd = True.
      • par2_repair returns (True, False).
      • The main SABnzbd loop pauses post-processing, downloads the required PAR2 files, and then calls par2_repair again to retry the external command.
    • If no (not enough blocks exist even in undownloaded files): The job fails with "Repair failed, not enough repair blocks".
  4. Outcome: par2cmdline_verify returns finished (True/False), readd (True/False), and lists of files used/renamed.

Phase 4: Cleanup (sabnzbd/newsunpack.py)

Function: par2_repair(nzo, setname) (final part)

Trigger: Runs only if the external PAR2 process completed successfully (finished = True) and the setting Enable PAR2 cleanup (cfg.enable_par_cleanup()) is active.

Process:

  1. Identifies files to delete:
    • The main .par2 file used for the check (parfile).
    • The base setname.par2 file.
    • All associated .vol*.par2 files for the set (setpars).
    • Any temporary backup files (e.g., filename.rar.1) created by the PAR2 tool.
    • Any source files explicitly used by PAR2 for reconstruction (used_for_repair).
  2. Iterates through the deletables list and attempts to remove each file using remove_file.

Code Snippet (Identifying Deletables):

# sabnzbd/newsunpack.py (lines 1080-1116 approx.)
if cfg.enable_par_cleanup():
    deletables = []
    new_dir_content = os.listdir(nzo.download_path)
    # Add .1 backup files created during repair
    for path in new_dir_content:
        if get_ext(path) == ".1" and path not in old_dir_content:
             deletables.append(os.path.join(nzo.download_path, path))

    deletables.append(parfile) # Main par2 used
    deletables.append(os.path.join(nzo.download_path, setname + ".par2")) # Base par2
    deletables.append(os.path.join(nzo.download_path, setname + ".PAR2")) # Case variation

    # Add source files used by par2 for reconstruction
    deletables.extend([os.path.join(nzo.download_path, f) for f in used_for_repair])
    # Add all PAR2 volume files for this set
    deletables.extend([os.path.join(nzo.download_path, nzf.filename) for nzf in setpars])

    # Delete the files
    for filepath in deletables:
        if os.path.exists(filepath):
            try:
                remove_file(filepath)
            except OSError:
                logging.warning("Deleting %s failed!", filepath)

Conclusion

SABnzbd implements a robust, multi-stage PAR2 verification and repair process. It prioritizes speed with an internal Quick Check using CRC32/size data gathered during download. When necessary, it intelligently manages an external PAR2 utility, parsing its output to track progress, handle block requirements (triggering downloads if needed), and determine success or failure, followed by optional cleanup of the PAR2 files. This ensures data integrity while optimizing for speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment