Skip to content

Instantly share code, notes, and snippets.

@seb26
Created February 10, 2025 01:42
Show Gist options
  • Select an option

  • Save seb26/6a1f5f70b981684f669f574610629580 to your computer and use it in GitHub Desktop.

Select an option

Save seb26/6a1f5f70b981684f669f574610629580 to your computer and use it in GitHub Desktop.

Usage

usage: destroy_nulls.py [-h] [--overwrite] input_filepath [output_filepath]

Process a file to remove BOM and NULL characters, re-encode as UTF8.

positional arguments:
  input_filepath   Path to the input file
  output_filepath  Path to the output result file

options:
  -h, --help       show this help message and exit
  --overwrite      If specified, overwrite the input file in place

Example

python3 destroy_nulls.py airtable_output.csv --overwrite
# MIT License
# Sebastian Reategui <seb.reategui@gmail.com>
import argparse
def process_file(input_filepath, output_filepath):
# Read the file
with open(input_filepath, 'rb') as f:
content = f.read()
# Remove BOM (Byte Order Mark) if present
if content[:3] == b'\xef\xbb\xbf':
content = content[3:]
# Decode to UTF-8, ignoring errors
content = content.decode('utf-8', errors='ignore')
# Remove NULL characters
content = content.replace('\x00', '')
# Convert back to bytes
content = content.encode('utf-8')
# Write the processed content to the output file
with open(output_filepath, 'wb') as f:
f.write(content)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Process a file to remove BOM and NULL characters, re-encode as UTF8.')
parser.add_argument('input_filepath', type=str, help='Path to the input file')
parser.add_argument('output_filepath', type=str, nargs='?', help='Path to the output result file')
parser.add_argument('--overwrite', action='store_true', help='If specified, overwrite the input file in place')
args = parser.parse_args()
# When specified overwrite the original file
if args.overwrite:
args.output_filepath = args.input_filepath
elif not args.output_filepath:
parser.error('the following arguments are required: output_filepath')
process_file(args.input_filepath, args.output_filepath)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment