Skip to content

Instantly share code, notes, and snippets.

@lucaslouca
Created April 27, 2018 08:29
Show Gist options
  • Select an option

  • Save lucaslouca/760815a47f5849a17b49784f3ad59af3 to your computer and use it in GitHub Desktop.

Select an option

Save lucaslouca/760815a47f5849a17b49784f3ad59af3 to your computer and use it in GitHub Desktop.
Strips out the <body>...</body> from an HTML file and writes the content into a new file.
from bs4 import BeautifulSoup
import os
import sys
def strip(path:str):
with open(path, "r") as html_file:
content = html_file.read()
soup = BeautifulSoup(content, 'html.parser')
body = soup.find('body').prettify()
# Write body to new file
new_file_path, file_extension = os.path.splitext(path)
new_file_path += "_body.txt"
with open(new_file_path, "w") as out:
out.write(body)
def main():
if len(sys.argv) <= 1:
print("Usage: %s file" % sys.argv[0])
return
strip(sys.argv[1])
if __name__ =='__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment