Tesseract提取特定信息

时间:2015-08-04 11:27:50

标签: ocr tesseract

我想扫描此图片,只从图片中获取姓名和城市信息。我该如何获得这些信息?

我正在使用tesseract 3.02。sample image

我必须处理数百张图片,并从中提取特定信息(比如姓名和城市)。

1 个答案:

答案 0 :(得分:1)

这完全取决于您使用的Tesseract SDK。我使用开源G8Tesseract iOS SDK来做一个类似于你想做的事情的项目。如果您使用该框架,这可能会有所帮助。我建议您在创建G8RecognitionOperation时,调用一种方法来检索名为recognitionCompleteBlock的数据。在此方法的完成块中,获取操作的结果并迭代并解析您喜欢的数据。既然你知道你想要的信息就在" Name"之后/之前"社会保障",我会在之前和之后切片所有不需要的文本,然后从那里解剖。像这样:

 G8RecognitionOperation *operation = [[G8RecognitionOperation alloc] initWithLanguage:@"eng"];
// Set up operation...

operation.recognitionCompleteBlock = ^(G8Tesseract *tesseract) {
    // Fetch the recognized text
    NSString *recognizedText = tesseract.recognizedText;

    NSLog(@"%@", recognizedText);

    // GET NAME
    // Split the result into two strings / Index 0 is trash because it is before Name
    NSArray *slice1 = [recognizedText componentsSeparatedByString:@"Name"];
    NSString *slice1String = slice1[1];

    // What comes before "Social" should be the name you are looking for
    NSArray *slice2 = [slice1String componentsSeparatedByString:@"Social"];
    NSString *name = slice2[0];

    //GET CITY (do the same thing here)
    // Split the rest of the result and get the desired data
    NSArray *slice3 = [slice2[1] componentsSeparatedByString:@"City"];
    NSString *slice3String = slice3[1];

    // What comes before "State" should be the city you are looking for
    NSArray *slice4 = [slice3String componentsSeparatedByString:@"State"];
    NSString *city = slice4[0];

    NSLog(@"Applicant Name: %@ | City: %@",name, city);

};