Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.

hpr3998 :: Using open source OCR to digitize my mom's book

How I used open source tools such as gphoto2 and the OCR software tesseract to digitize pages

<< First, < Previous, , Latest >>

Hosted by Deltaray on 2023-11-29 is flagged as Clean and is released under a CC-BY-SA license.
ocr,opensource,grep,scripts,programming. 2.

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:30:47


To improve the speed of my workflow, I wrote a bash script that uses the open source programs gphoto2, tesseract, grep and ImageMagick to digitize my mom's 338 page book. Here is the link to the script:


Subscribe to the comments RSS feed.

Comment #1 posted on 2023-11-29 20:39:52 by brian-in-ohio

good show

Enjoyed every minute of this show. Its someting I've wanted to try, now I think I will. Nice little rant at the end, hit the nail on the head. Keep the shows coming
Comment #2 posted on 2023-12-03 13:19:03 by Deltaray


Thanks, I appreciate that feedback and good luck with your endeavors.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Anti Spam Question: What does the P in HPR stand for ?