Tesseract无法找到经过培训的数据文件

时间:2014-05-30 16:00:44

标签: ios objective-c tesseract objective-c++

我已将eng.traineddata文件正确包含在项目中。它工作正常。突然间它开始给我以下错误和崩溃。

  

打开数据文件时出错   /var/mobile/Applications/B36E2682-933F-4B12-9B32-4C3F640BE19E/Documents/tessdata/eng.traineddata   请确保将TESSDATA_PREFIX环境变量设置为   “tessdata”目录的父目录。加载失败   language'eng'Tesseract无法加载任何语言!

我使用的代码

- (NSString*) pathToLangugeFIle{


// Set up the tessdata path. This is included in the application bundle
// but is copied to the Documents directory on the first run.
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentPath = ([documentPaths count] > 0) ? [documentPaths objectAtIndex:0] : nil;

NSString *dataPath = [documentPath stringByAppendingPathComponent:@"tessdata"];
NSFileManager *fileManager = [NSFileManager defaultManager];
// If the expected store doesn't exist, copy the default store.
if (![fileManager fileExistsAtPath:dataPath]) {
    // get the path to the app bundle (with the tessdata dir)
    NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
    NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@"tessdata"];
    if (tessdataPath) {
        [fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];
    }
}

setenv("TESSDATA_PREFIX", [[documentPath stringByAppendingString:@"/"] UTF8String], 1);

return dataPath;
}

- (NSString*) OCRImage:(UIImage*)src{


// init the tesseract engine.
tesseract::TessBaseAPI *tesseract = new tesseract::TessBaseAPI();

tesseract->Init([[self pathToLangugeFIle] cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetVariable("tessedit_char_whitelist", ":-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");

//Pass the UIIMage to cvmat and pass the sequence of pixel to tesseract

cv::Mat toOCR=[src CVGrayscaleMat];

NSLog(@"%d", toOCR.channels());

tesseract->SetImage((uchar*)toOCR.data, toOCR.size().width, toOCR.size().height
                    , toOCR.channels(), toOCR.step1());

tesseract->Recognize(NULL);

char* utf8Text = tesseract->GetUTF8Text();

return [NSString stringWithUTF8String:utf8Text];

}

1 个答案:

答案 0 :(得分:0)

// Copy the training data into the documents directory
    NSArray* trainingDataSuffix = [NSArray arrayWithObjects:@"DangAmbigs",@"freq-dawg",@"inttemp",@"normproto",@"pffmtable",@"traineddata",@"unicharset",@"user-words",@"word-dawg",nil];

    // Get the path to the resource files
    NSString* bundlePath = [[NSBundle mainBundle] bundlePath];
    // Hold a potential error
    NSError* error = nil;
    // Get the contents of the resource directory
    NSArray* dirListing = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:bundlePath error:&error];
    // Boolean to determine whether we have already created a directory or not
    BOOL createdDirectory = NO;
    // The path to the documents directory when appended with the tessdata folder
    NSString* documentsDirectory = [[App getHiddenDocumentPath:@""] stringByAppendingPathComponent:@"tessdata"];
    // Loop the resource files
    for(NSString* file in dirListing)
    {
        // Loop the possible extensions we are looking for
        for(NSString* extension in trainingDataSuffix)
        {
            // Check if the extension is one of these extensions we have been looking for
            if([[file pathExtension] isEqualToString:extension])
            {
                // Check if we have created the directory
                if(!createdDirectory)
                {
                    // Create the directory
                    [[NSFileManager defaultManager] createDirectoryAtPath:documentsDirectory withIntermediateDirectories:YES attributes:nil error:&error];
                    // If we have an error tell us what it is
                    if(error != nil)
                    {
                        NSLog(@"Error: %@",error);
                        error = nil;
                    }
                    // If not, tell the loop we have created the directory so we don't have to do it again
                    else createdDirectory = YES;
                }
                // Get the path of the file in the tessdata directory
                NSString* fileInDocumentsDir = [documentsDirectory stringByAppendingPathComponent:[file lastPathComponent]];
                // Check if the file already exists
                if(![[NSFileManager defaultManager] fileExistsAtPath:fileInDocumentsDir])
                {
                    // If not, copy the file to the tessdata directory
                    [[NSFileManager defaultManager] copyItemAtPath:[bundlePath stringByAppendingPathComponent:file] toPath:fileInDocumentsDir error:&error];
                    // If we have an error tell us what it is
                    if(error != nil)
                    {
                        NSLog(@"Error: %@",error);
                        error = nil;
                    }
                }
                // We have found a valid extension, it's unlikely we'll find another so break the loop
                break;
            }
        }
    }

    // set the environment variable TESSDATA_PREFIX to the path before the tessdata folder, in this case it's the documents directory
    setenv("TESSDATA_PREFIX",[[App getHiddenDocumentPath:@""] UTF8String],1);

使用此代码将经过培训的数据复制到资源路径。现在工作正常。

来源 - http://b2cloud.com.au/tutorial/tesseract-ocr-and-cross-compiling-on-ios/