最近在想没有一个图片文字识别的工具呢?我想到了OCR,国内比较牛逼的汉王OCR。那借助python能否实现呢?于是我找啊找查啊查有关PYthon在这方面探讨的资料,发现PyTesser 这样一个好玩的程序!拿出来分享讨论一下:
PyTesser 是python的一个光学字符识别模块,它结合Tesseract OCR引擎来使用,能从一个图片或图像文件取出的字符串并输出。
使用PyTesser ,你无须安装Tesseract OCR引擎,但就必须要先安装PIL模块(Python Image Library, python的图形库)
官方介绍说明:
PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.
PyTesser uses the Tesseract OCR engine,converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.
PyTesser uses the Tesseract OCR engine,converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.
PyTesser 官方下载地址:http://code.google.com/p/PyTesser/downloads/list
PIL库资源地址: http://www.pythonware.com/products/pil/
不过,在测试使用过程中,发觉只对英文内容识别较理想,而对中文无法处理识别!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。