OpenALPR通过自定义训练数据退出分段错误

时间:2015-01-14 18:41:21

标签: c++ ocr tesseract

我正在使用OpenALPR,我已经训练过OCR以识别强制字体。当我尝试使用训练有素的数据时,alpr退出并出现分段错误。

我正在使用版本1.2.0和tesseract 3.03,使用leptonica-1.71

当我使用gdb运行它时,我得到以下堆栈跟踪:

(gdb) bt
#0  0x00007ffff67ded7b in tesseract::Classify::ComputeCharNormArrays(FEATURE_STRUCT*, INT_TEMPLATES_STRUCT*, unsigned char*, unsigned char*) () from /usr/local/lib/libtesseract.so.3
#1  0x00007ffff67e3b6b in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) () from /usr/local/lib/libtesseract.so.3
#2  0x00007ffff6806882 in tesseract::TessClassifier::UnicharClassifySample(tesseract::TrainingSample const&, Pix*, int, int, GenericVector<tesseract::UnicharRating>*) () from /usr/local/lib/libtesseract.so.3
#3  0x00007ffff67e1b22 in tesseract::Classify::CharNormClassifier(TBLOB*, tesseract::TrainingSample const&, ADAPT_RESULTS*) () from /usr/local/lib/libtesseract.so.3
#4  0x00007ffff67e1c95 in tesseract::Classify::DoAdaptiveMatch(TBLOB*, ADAPT_RESULTS*) () from /usr/local/lib/libtesseract.so.3
#5  0x00007ffff67e1f24 in tesseract::Classify::AdaptiveClassifier(TBLOB*, BLOB_CHOICE_LIST*) () from /usr/local/lib/libtesseract.so.3
#6  0x00007ffff67d993d in tesseract::Wordrec::call_matcher(TBLOB*) () from /usr/local/lib/libtesseract.so.3
#7  0x00007ffff67d9986 in tesseract::Wordrec::classify_blob(TBLOB*, char const*, C_COL, BlamerBundle*) () from /usr/local/lib/libtesseract.so.3
#8  0x00007ffff67d6bab in tesseract::Wordrec::classify_piece(GenericVector<SEAM*> const&, short, short, char const*, TWERD*, BlamerBundle*) () from /usr/local/lib/libtesseract.so.3
#9  0x00007ffff67c808d in tesseract::Wordrec::chop_word_main(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#10 0x00007ffff67d9821 in tesseract::Wordrec::cc_recog(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#11 0x00007ffff6716e62 in tesseract::Tesseract::recog_word_recursive(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#12 0x00007ffff6716ff5 in tesseract::Tesseract::recog_word(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#13 0x00007ffff6708160 in tesseract::Tesseract::tess_segment_pass_n(int, WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#14 0x00007ffff66cf2a5 in tesseract::Tesseract::match_word_pass_n(int, WERD_RES*, ROW*, BLOCK*) () from /usr/local/lib/libtesseract.so.3
#15 0x00007ffff66cf482 in tesseract::Tesseract::classify_word_pass1(tesseract::WordData*, WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#16 0x00007ffff66d26d6 in tesseract::Tesseract::classify_word_and_language(void (tesseract::Tesseract::*)(tesseract::WordData*, WERD_RES*), tesseract::WordData*) () from /usr/local/lib/libtesseract.so.3
#17 0x00007ffff66d2dea in tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, GenericVector<tesseract::WordData>*) () from /usr/local/lib/libtesseract.so.3
#18 0x00007ffff66d3701 in tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int) () from /usr/local/lib/libtesseract.so.3
#19 0x00007ffff66c203d in tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) () from /usr/local/lib/libtesseract.so.3
#20 0x00000000004aadf4 in OCR::performOCR (this=0x93d7c0, pipeline_data=0x7fffe0a3a9b0) at /opt/openalpr/src/openalpr/ocr.cpp:79
#21 0x000000000048845b in plateAnalysisThread (arg=0x7fffffffd380) at /opt/openalpr/src/openalpr/alpr_impl.cpp:261
#22 0x00000000004df217 in tthread::thread::wrapper_function (aArg=0x93d210) at /opt/openalpr/src/openalpr/support/tinythread.cpp:169
#23 0x00007ffff6401182 in start_thread (arg=0x7fffe0a3b700) at pthread_create.c:312
#24 0x00007ffff590e00d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

1 个答案:

答案 0 :(得分:1)

当我看到发生错误时,我正在调试tesseract,因为在文件adaptmatch.cpp上调用了shape_table _-&gt; getShape(id),其id为&gt; =到shape_table_的大小。

作为一种解决方法,我更改了代码以首先检查大小并跳过迭代而不是使用segfault退出。

这种解决方法可能会产生不良后果,但至少它会停止破坏。这是差异:

diff --git a/classify/adaptmatch.cpp b/classify/adaptmatch.cpp
index 0eaf144..b21d980 100644
--- a/classify/adaptmatch.cpp
+++ b/classify/adaptmatch.cpp
@@ -1148,7 +1148,7 @@ void Classify::ExpandShapesAndApplyCorrections(
     fontinfo_id = ClassAndConfigIDToFontOrShapeID(class_id, int_result.Config);
     fontinfo_id2 = ClassAndConfigIDToFontOrShapeID(class_id,
                                                    int_result.Config2);
-    if (shape_table_ != NULL) {
+    if (shape_table_ != NULL && fontinfo_id < shape_table_->NumShapes()) {
       // Actually fontinfo_id is an index into the shape_table_ and it
       // contains a list of unchar_id/font_id pairs.
       int shape_id = fontinfo_id;
@@ -1781,10 +1781,12 @@ void Classify::ComputeCharNormArrays(FEATURE_STRUCT* norm_feature,
         int font_set_id = templates->Class[id]->font_set_id;
         const FontSet &fs = fontset_table_.get(font_set_id);
         for (int config = 0; config < fs.size; ++config) {
-          const Shape& shape = shape_table_->GetShape(fs.configs[config]);
-          for (int c = 0; c < shape.size(); ++c) {
-            if (char_norm_array[shape[c].unichar_id] < pruner_array[id])
-              pruner_array[id] = char_norm_array[shape[c].unichar_id];
+          if (shape_table_->NumShapes() > fs.configs[config]) {
+            const Shape shape = shape_table_->GetShape(fs.configs[config]);
+            for (int c = 0; c < shape.size(); ++c) {
+              if (char_norm_array[shape[c].unichar_id] < pruner_array[id])
+                pruner_array[id] = char_norm_array[shape[c].unichar_id];
+            }
           }
         }
       }