我面临一个问题。我正在运行一个Python脚本,该脚本使用tesseract将pdf转换为图像。
for filename in path_list:
print(filename)
pdfFile = wi(filename = filename, resolution = 300)
image = pdfFile.convert('jpeg')
imageBlobs = []
for img in image.sequence:
imgPage = wi(image = img)
imageBlobs.append(imgPage.make_blob('jpeg'))
extract = []
for imgBlob in imageBlobs:
image = Image.open(io.BytesIO(imgBlob))
text = pytesseract.image_to_string(image, lang = 'eng')
从11个pdf提取内容后,出现以下错误。 pdf文件不是问题,因为当我单独提供特定的pdf文件时,它会提取其内容。 我在Ubuntu 16.04上运行脚本
任何帮助将不胜感激。
Error: -
File "/home/steve/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170 ,in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "ocr_script.py", line 466, in <module>
gather_details(path_list)
File "ocr_script.py", line 45, in gather_details
discover_data('Indexing',discoveryPath,final_meta,start_time)
File "ocr_script.py", line 165, in discover_data
text = pytesseract.image_to_string(image, lang='eng')
File "/home/steve/.local/lib/python3.5/site
packages/pytesseract/pytesseract.py", line 294
, in image_to_string
return run_and_get_output(*args)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 202
, in run_and_get_output
run_tesseract(**kwargs)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 172
, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: /usr/bin/tesseract is not
installed or it's