从Python调用cpp函数时出现分段错误

时间:2016-04-26 16:50:13

标签: python c++ ctypes tesseract

我试图从python中调用this cpp function

TESS_API BOOL TESS_CALL TessBaseAPIProcessPages(TessBaseAPI* handle, const char* filename, 
  const char* retry_config, int timeout_millisec, TessResultRenderer* renderer)
{
    if (handle->ProcessPages(filename, retry_config, timeout_millisec, renderer))
        return TRUE;
    else
        return FALSE;
}

此函数的最后一个参数是TessResultRendereranother cpp function用于创建TessResultRenderer

TESS_API TessResultRenderer* TESS_CALL TessTextRendererCreate(const char* outputbase)
{
    return new TessTextRenderer(outputbase);
}

现在从我的python中调用它时,我做了以下内容:

outputbase = "stdout"
renderer = tesseract.TessTextRendererCreate(outputbase)
text_out = tesseract.TessBaseAPIProcessPages(api, 
     ctypes.create_string_buffer(path), 
     None, 0, renderer) //Segmentation fault (core dumped) error on this line

但我一直收到Segmentation fault错误。

我的问题是如何从Python调用TessBaseAPIProcessPages

进入代码库的更多参考链接:

referer api

Implementation of processPages(...)

修改

在尝试评论后的建议后,我执行了以下操作,但收到错误:item 1 in _argtypes_ has no from_param method

PTessResultRenderer = ctypes.POINTER(TessResultRenderer)
self.tesseract.TessTextRendererCreate.restype = PTessResultRenderer
outputbase = "stdout"
self.tesseract.TessTextRendererCreate.argtypes = [outputbase] #error here
self.tesseract.TessTextRendererCreate

ReturnVal = ctypes.c_bool
self.tesseract.TessBaseAPIProcessPages.argtypes = [self.api, path, None, 0, PTessResultRenderer]
self.tesseract.TessBaseAPIProcessPages.restype = ReturnVal
self.tesseracto.TessBaseAPIProcessPages

class TessResultRenderer(ctypes.Structure):
    pass

2 个答案:

答案 0 :(得分:3)

有一个使用contrib文件夹中的ctypes的tesseract C-API的示例。然而它似乎有点过时了。 contrib/tesseract-c_api-demo.py

您需要为一些方法设置restypeargtypes。另外,不要忘记在处理程序上调用init函数。以下示例适用于我。它将文本从英文名为“test.bmp”的文件读入text变量。

from ctypes import *
from ctypes.util import find_library

lang = b"eng"
filename = b"test.bmp"
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata"

path = find_library("libtesseract.dylib")
tesseract = CDLL(path)

class TessBaseAPI(Structure):
    pass
class TessResultRenderer(Structure):
    pass

tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI)
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p]
tesseract.TessBaseAPIInit3.restype = c_bool
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)]
tesseract.TessBaseAPIProcessPages.restype = c_bool
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)]
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p

api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang);
if (rc):
    tesseract.TessBaseAPIDelete(api)
    print("Could not initialize tesseract.\n")
    exit(3)

success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None)

if success:
    text = tesseract.TessBaseAPIGetUTF8Text(api)
    print("="*78)
    print(text.decode("utf-8").strip())
    print("="*78)

输出如下:

==============================================================================
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.
==============================================================================

编辑:根据eryksun的建议,将c_void_p替换为不透明类型。谢谢!

答案 1 :(得分:0)

当您运行数组或者取消引用空指针时,会发生分段错误。如果您使用调试器,它将引导您完成所有代码并准确显示正在发生的事情。