Skip to content

Instantly share code, notes, and snippets.

@ashavijit
Created April 9, 2023 22:08
Show Gist options
  • Select an option

  • Save ashavijit/995ad50183386efe46f0aee37a7ea23a to your computer and use it in GitHub Desktop.

Select an option

Save ashavijit/995ad50183386efe46f0aee37a7ea23a to your computer and use it in GitHub Desktop.
A script that converts PDF to TXT using python
from io import StringIO
from typing import List
from PyPDF2 import PdfReader
def convert_pdf_to_txt(pdf_file_path: str) -> str:
with open(pdf_file_path, 'rb') as f:
pdf = PdfReader(f)
pages: List[str] = []
for page in pdf.pages:
text = page.extract_text().strip()
pages.append(text)
return '\n'.join(pages)
if __name__ == '__main__':
pdf_file_path = 'a.pdf'
text = convert_pdf_to_txt(pdf_file_path)
with open('example.txt', 'w') as f:
f.write(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment