Skip to content

Instantly share code, notes, and snippets.

@balfiere
Last active July 13, 2025 19:36
Show Gist options
  • Select an option

  • Save balfiere/c39feb84f9626fb895f2e921a3b9f552 to your computer and use it in GitHub Desktop.

Select an option

Save balfiere/c39feb84f9626fb895f2e921a3b9f552 to your computer and use it in GitHub Desktop.
A workaround to python_thai_ocr crashing on some of my pdfs. Inside a folder of images, OCR each image and append the output to the argument passed to the script. Example usage: ~/scripts/thai_images2ocr ocr.txt
#!/bin/bash
# first argument is the output file
output=$1
# remove the output file if it already exists
rm -f "$output"
# create the output file
touch "$output"
# page counter
i=0
# process each image in the current directory
shopt -s nocaseglob
shopt -s nullglob
for f in *.{jpg,jpeg,tiff,bmp,png}
do
# increase the page counter
i=$((i+1))
# process the image and save output to temporary file
# uses https://github.com/nanonymoussu/python_thai_ocr
python $HOME/python_thai_ocr/main.py "$f" temp
# add page header
echo -e "✧˖°─ .✦──── ・ 。゚⟡ ☽ Page ${i} ☾ ⟡ ˚。 ・ ────✦.─ °˖✧\n" | cat >> "$output" # cute version (more at https://emojicombos.com/divider)
# echo -e "===============Page ${i}===============\n" | cat >> "$output" # normal version
# add page content
cat temp >> "$output"
# add empty lines at end of page
echo -e "\n\n" | cat >> "$output"
done
# remove temporary file
rm temp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment