我正在尝试设置OCR网络服务,以便我可以从多个位置发送图像进行处理。
我从未对cgi做过任何事情,所以我说是时候尝试mod_wsgi了。我花了2天时间来安装所有库,以及opencv和pytesseract。如果我按照“正常方式”(启动一个新的python窗口解释器),我的OCR工作正常。我有很多麻烦使得一些库可以使用mod_wsgi,即使它们正常工作。
我被困在了pytessearct。如果我用:
运行它tesseract -l myl image.jpe out
一切正常。
即使我这样做:
import pytessearct
from PIL import Image
pytesseract.image_to_string(Image.open('/var/www/path/image.jpe'), lang='myl')
这也有效。
如果我使用mod_wsgi执行此操作,则会在 httpd log 文件中出现此错误:
mod_wsgi (pid=1836): Exception occurred processing WSGI script '/var/www/path/app.wsgi'.
[Mon May 18 06:28:31 2015] [error] [client IP] Traceback (most recent call last):
[Mon May 18 06:28:31 2015] [error] [client IP] File "/var/www/path/app.wsgi", line 28, in wsgi_app
[Mon May 18 06:28:31 2015] [error] [client IP] output = check_text('a.jpe')
[Mon May 18 06:28:31 2015] [error] [client IP] File "/var/www/path/app.wsgi", line 20, in check_text
[Mon May 18 06:28:31 2015] [error] [client IP] return pytesseract.image_to_string(Image.open('/var/www/path/a.jpe'), lang='myl')
[Mon May 18 06:28:31 2015] [error] [client IP] File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 161, in image_to_string
[Mon May 18 06:28:31 2015] [error] [client IP] boxes=boxes,
[Mon May 18 06:28:31 2015] [error] [client IP] File "/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 94, in run_tesseract
[Mon May 18 06:28:31 2015] [error] [client IP] stderr=subprocess.PIPE)
[Mon May 18 06:28:31 2015] [error] [client IP] File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
[Mon May 18 06:28:31 2015] [error] [client IP] errread, errwrite)
[Mon May 18 06:28:31 2015] [error] [client IP] File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
[Mon May 18 06:28:31 2015] [error] [client IP] raise child_exception
[Mon May 18 06:28:31 2015] [error] [client IP] OSError: [Errno 2] No such file or directory
这是我的 app.wsgi 文件:
#!/usr/local/bin python2.7
#-*- coding: utf-8 -*-
import os
import sys
from subprocess import check_output
sys.path.append('/var/www/path')
import pytesseract
from PIL import Image
def check_text(image_path):
# return check_output(['pytesseract', '-l', 'myl', '/var/www/path/a.jpe'])
return pytesseract.image_to_string(Image.open('/var/www/path/a.jpe'), lang='myl')
def wsgi_app(environ, start_response):
output = sys.version.encode('utf-8')
status = '200 OK'
headers = [('Content-type', 'text/plain'), ('Content-Length', str(len(output)))]
output = check_text('a.jpe')
start_response(status, headers)
return os.getcwd()
return output
# mod_wsgi need the *application* variable to serve our small app
application = wsgi_app
正如你在源代码中看到的那样,我也试过了子进程的check_output,自己开始一个新的pytesseract进程,但是我得到了同样的错误。
我已经从源代码构建了tesseract和mod_wsgi。 但同样,我确定它与mod_wsgi有关,因为如果我在python中正常工作它会起作用。
更新:我有一个与mod_wsgi和opencv类似的“奇怪”问题。问题和答案可以在这里找到:Occasional ctypes error importing numpy from mod_wsgi django app
任何建议都将受到赞赏。
答案 0 :(得分:1)
为了解决这个问题,我已将/usr/local/lib/python2.7/site-packages/pytesseract/pytesseract.py
行tesseract_cmd = 'tesseract'
更改为tesseract_cmd = '/usr/local/bin/tesseract'.