If you want to extract text out of an image, you may enjoy using tesseract. It's an old school tool that works quite well and it's also very lightweight. You'll first want to install the tool on your system before you can access it from python.
# For Ubuntu apt install tesseract-ocr # For Mac brew install tesseract
Once that is installed, you can install the python binding in your virtual environment.
python -m pip install pytesseract
Once that's all installed, you can use pytesseract from inside of your jupyter notebook!
try: from PIL import Image except ImportError: import Image import pytesseract img = Image.open('path/to/img.png') print(pytesseract.image_to_string(img))
You can even get the bounding boxes out per character if you'd like.
tesseract works quite well in many applications, it's not
a perfect solution. It mainly works on images with a white background
that have text on it that's generated by a printer or a computer.
When you're using handwritten notes that are all over a page then
your milage might certainly vary.
Back to main.