来自C#的Native调用的最佳实践

时间:2016-03-02 08:55:05

标签: c# performance

我想知道从C#应用程序调用外部依赖项的最佳实践/设计是什么?我的应用程序被破坏为在其他应用程序中使用的DLL。

我有一个名为OCRObject的类,我不知道是否应该将它设为静态。

这是我调用外部DLL的代码:

/// <summary>
/// A static instance of OCRObject that handles the OCR part of the application. This class
/// calls a native libary and the required files must therfore be present in /Tesseract folder.
/// </summary>
internal class OCRObject
{
    /// <summary>
    /// Calls the Native C++ libary and returns an UTF-8 string of the image text.
    /// </summary>
    /// <param name="imagePath">   The full image path.</param>
    /// <param name="tessConfPath">The tesseract configuration path.</param>
    /// <param name="tessLanguage">The tesseract language.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    public string GetOCRText(string imagePath, string tessConfPath, string tessLanguage)
    {
        try
        {
            if (StaticObjectHolder.EnableAdvancedLogging)
            {
                Logger.Log(string.Format("Doing OCR on folder {0}.", imagePath));
            }
            return this.StringFromNativeUtf8(OCRObject.GetUTF8Text(tessConfPath, tessLanguage, imagePath));
        }
        catch (AccessViolationException ave)
        {
            Logger.Log(ave.ToString(), LogInformationType.Error);
        }
        catch (Exception ex)
        {
            Logger.Log(ex.ToString(), LogInformationType.Error);
        }
        return string.Empty;
    }

    /// <summary>
    /// The DLL Import declaration. The main entry point is GetUTF8Text which is the method in
    /// the native libary. This method extracts text from the image and returns and UTF-8 representation of the string.
    /// </summary>
    /// <param name="path">   The path of the configuration files.</param>
    /// <param name="lang">   The language to parse. For example DAN, ENG etc.</param>
    /// <param name="imgPath">The full path of the image to extract image from.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    [DllImport(@"\Tesseract\TesseractX64.dll", EntryPoint = "GetUTF8Text", CallingConvention = CallingConvention.Cdecl)]
    private static extern IntPtr GetUTF8Text(string path, string lang, string imgPath);

    /// <summary>
    /// Converts the returned IntPtr from the native call to a UTF-8 based string.
    /// </summary>
    /// <param name="nativeUtf8">The native UTF8.</param>
    /// <returns></returns>
    [HandleProcessCorruptedStateExceptions]
    private string StringFromNativeUtf8(IntPtr nativeUtf8)
    {
        try
        {
            int len = 0;
            if (nativeUtf8 == IntPtr.Zero)
            {
                return string.Empty;
            }

            while (Marshal.ReadByte(nativeUtf8, len) != 0)
            {
                ++len;
            }

            byte[] buffer = new byte[len];
            Marshal.Copy(nativeUtf8, buffer, 0, buffer.Length);
            string text = Encoding.UTF8.GetString(buffer);
            nativeUtf8 = IntPtr.Zero; /*set to zero.*/
            return text;
        }
        catch
        {
            return string.Empty;
        }
    }
}

我的目标是获得最大的性能,所以我想知道这个代码是否可以通过使这个类静态或查找任何代码来优化?

这是C ++代码:

#include "stdafx.h"
#include "OCRWrapper.h"
#include "allheaders.h"
#include "baseapi.h"
#include "iostream"
#include "fstream";
#include "vector";
#include "algorithm"
#include "sys/types.h"
#include "sstream"

OCRWrapper::OCRWrapper()
{
}

//OCRWrapper::~OCRWrapper()
//{
//}

/// <summary>
/// Sets the image path to read text from.
/// </summary>
/// <param name="imgPath">The img path.</param>
/// <summary>
/// Get the text from the image in UTF-8. Remeber to Convert it to UTF-8 again on the callee side.
/// </summary>
/// <returns></returns>
char* OCRWrapper::GetUTF8Text(char* path, char* lang, char* imgPath)
{
    char* imageText = NULL;
    try
    {
        tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

        if (api->Init(path, lang)) {
            fprintf(stderr, "Could not initialize tesseract. Incorrect datapath or incorrect lanauge\n"); /*This should throw an error to the caller*/
            exit(1);
        }

        /*Open a reference to the imagepath*/
        Pix *image = pixRead(imgPath);

        /*Read the image object;*/
        api->SetImage(image);

        // Get OCR result
        imageText = api->GetUTF8Text();

        /*writeToFile(outText);*/
        /*printf("OCR output:\n%s", imageText);*/

        /*Destroy the text*/
        api->End();

        pixDestroy(&image);
        /*std::string x = std::string(imageText);*/
        return imageText;
    }
    catch (...)
    {
        std::string errorStr("An error occured during OCR. ImgPath => " + std::string(imgPath));
        return &errorStr[0];
    }
}

1 个答案:

答案 0 :(得分:3)

最佳表现?将C ++ / CLR用于接口类。差异很小但可能相关。如果你可以避免字符串生成,它会更大 - 使用C#interop字符串必须编组,使用C ++ / CLR你可以重用缓存的字符串。取决于下游的低级API。

但就OCR而言,我认真地认为你吠叫了错误的树。 OCR是一种处理器密集型操作,因此所有优化的呼叫 - 与处理相比很少和很远 - 只是不相关。我要优化这些东西的时间是例如交换数据流,可以每秒调用数十万次 - 最少的数据转发到C#处理。但是对于OCR,我认为这很重要。特别是如果你不处理图像开始 - 这是考虑优化的唯一方法。

GetOCRText调用需要多长时间?如果它大大超过1/1000秒 - 那么严重的是你要尝试优化错误的元素。呼叫开销很小(比这小得多)。