ABBYY FineReader SDK如何定义最小识别率

时间:2013-03-04 14:34:05

标签: .net ocr abbyy finereader

是否有人使用fineReader abbyy sdk 10?我很好奇是否有可能在图像处理后获得数据挖掘的成功率。

对于我们有从图像收集数据的工作流程的场景,如果识别结果小于90%,那么我们将我们的批处理用于视觉验证/更正。

对于sdk处理我正在使用.net - 知道它不是那么重要但是......仅仅是为了以防万一

我怎样才能达到这个数字?感谢您的建议

3 个答案:

答案 0 :(得分:1)

没有“全球认可信心”财产。预计开发人员将使用他们自己的置信标准自行计算。最简单的方法是遍历每个字符,检查CharParams.IsSuspicious属性。这是FREngine 11的代码示例(C#)

    //Statistics counters 

    //Count of all suspicious symbols in layout
    private int suspiciousSymbolsCount;
    //Count of all unrecognized symbols in layout
    private int unrecognizedSymbolsCount;
    //Count of all nonspace symbols in layout
    private int allSymbolsCount;
    //Count of all words in layout
    private int allWordsCount;
    //Count of all not dictionary word in layout
    private int notDictionaryWordsCount;
    private void processImage()
    {
        // Create document
        FRDocument document = engineLoader.Engine.CreateFRDocument();

        try {
            // Add image file to document
            displayMessage( "Loading image..." );
            string imagePath = Path.Combine( FreConfig.GetSamplesFolder(), @"SampleImages\Demo.tif" );

            document.AddImageFile( imagePath, null, null );

            //Recognize document
            displayMessage( "Recognizing..." );
            document.Process( null );

            // Calculate text statistics
            displayMessage( "Calculating statistics..." );
            clearStatistics();
            for( int i = 0; i < document.Pages.Count; i++ ) {
                calculateStatisticsForLayout( document.Pages[i].Layout );
            }

            //show calculated statistics
            displayStatistics();

        } catch( Exception error ) {
            MessageBox.Show( this, error.Message, this.Text, MessageBoxButtons.OK, MessageBoxIcon.Error );
        }
        finally {
            // Close document
            document.Close();
        }
    }
    private void calculateStatisticsForLayout( Layout layout )
    {    
        LayoutBlocks blocks = layout.Blocks;
        for( int index = 0; index < blocks.Count; index++ ) {
            calculateStatisticsForBlock( blocks[index] );
        }
    }

    void calculateStatisticsForBlock( IBlock block )
    {           
        if( block.Type == BlockTypeEnum.BT_Text ) {
            calculateStatisticsForTextBlock( block.GetAsTextBlock() );
        } else if( block.Type == BlockTypeEnum.BT_Table ) {
            calculateStatisticsForTableBlock( block.GetAsTableBlock() );
        }
    }

    void calculateStatisticsForTextBlock( TextBlock textBlockProperties )
    {
        calculateStatisticsForText( textBlockProperties.Text );
    }

    void calculateStatisticsForTableBlock( TableBlock tableBlockProperties )
    {
        for( int index = 0; index < tableBlockProperties.Cells.Count; index++ ) {
            calculateStatisticsForBlock( tableBlockProperties.Cells[index].Block );
        }
    }

    void calculateStatisticsForText( Text text ) 
    {
        Paragraphs paragraphs = text.Paragraphs;
        for( int index = 0; index < paragraphs.Count; index++ ) {
            calculateStatisticsForParagraph( paragraphs[index] );
        }
    }

    void calculateStatisticsForParagraph( Paragraph paragraph )
    {
        calculateCharStatisticsForParagraph( paragraph );

        calculateWordStatisticsForParagraph( paragraph );
    }

    void calculateCharStatisticsForParagraph( Paragraph paragraph )
    {
        for( int index = 0; index < paragraph.Text.Length; index++ )
        {
            calculateStatisticsForChar( paragraph, index );
        }
    }

    void calculateStatisticsForChar( Paragraph paragraph, int charIndex )
    {
        CharParams charParams = engineLoader.Engine.CreateCharParams();
        paragraph.GetCharParams( charIndex, charParams );
        if( charParams.IsSuspicious ) 
        {
            suspiciousSymbolsCount++;
        }

        if( isUnrecognizedSymbol( paragraph.Text[charIndex] ) ) 
        {
            unrecognizedSymbolsCount++;
        }

        if( paragraph.Text[charIndex] != ' ' ) 
        {
            allSymbolsCount++;
        }
    }

    void calculateWordStatisticsForParagraph( Paragraph paragraph )
    {
        allWordsCount += paragraph.Words.Count;

        for( int index = 0; index < paragraph.Words.Count; index++ ) 
        {
            if( !paragraph.Words[index].IsWordFromDictionary ) 
            {
                notDictionaryWordsCount ++;
            }
        }
    }

    bool isUnrecognizedSymbol( char symbol )
    {
        //it is special constant used by FREngine recogniser
        return ( symbol == 0x005E );
    }

    void displayStatistics()
    {
        labelAllSymbols.Text = "All symbols: " + allSymbolsCount.ToString();
        labelSuspiciosSymbols.Text = "Suspicious symbols: " + suspiciousSymbolsCount.ToString();
        labelUnrecognizedSymbols.Text = "Unrecognized symbols: " + unrecognizedSymbolsCount.ToString();

        labelAllWords.Text = "All words: " + allWordsCount.ToString();
        labelNotDictionaryWords.Text = "Non-dictionary words: " + notDictionaryWordsCount.ToString();
    }

答案 1 :(得分:0)

恕我直言,没有这样的“全球信心”价值 - 但你可以通过获得每个角色的信心并平均总数来轻松获得这一点。 但是,我认为您应该将您的请求发送到ABBYY的论坛或支持电子邮件地址,以了解他们的建议。

如果我使用引擎,可能无法告诉你我可能获得什么样的信心,因为所有这些都依赖于图像的质量,字体的大小等等:没有这样的东西作为业界用来建立数据的“平均文件”。

祝你好运!

答案 2 :(得分:0)

FRE SDK识别的结果仅在Text或Table块中包含文本。建议您使用一个全局字数变量。

  1. 运行一个异步方法来遍历单词并获取单词中可疑字符的数量。 (很可疑)
  2. 查找每页中包含可疑字符的单词总数
  3. (带有可疑字符的单词)/(单词总数),然后将结果乘以100。

    2/4等于0.5。乘以0.5 * 100 = 50%。这就是您的准确性。上面是abbyy的另一个答案中提供了用于检查可疑字符和置信度的代码示例。