如何从pytesseract提取中提取源文件

时间:2019-05-08 05:56:55

标签: python ocr python-tesseract

要点是从图像池中提取OCR / tesseract数据后,然后运行for index, row in df.iterrows(): result = row['text']#from the OCR file_1 = re.match(r'Mountain', result) file_2 = re.match(r'Lake', result) if file_1: print #how do I fetch/get the original file that has the matching word for file_1

如何获取带有“山”字的源文件?

在我这方面仍然有点模糊。你能帮忙吗?谢谢!

class Program
{
    [DllImport("kernel32.dll")]
    public static extern IntPtr OpenProcess(int dwDesiredAccess, bool bInheritHandle, int dwProcessId);

    [DllImport("kernel32.dll")]
    public static extern bool ReadProcessMemory(int hProcess, int lpBaseAddress, byte[] lpBuffer, int dwSize, ref int lpNumberOfBytesRead);

    const int PROCESS_WM_READ = 0x0010;

    static void Main(string[] args)
    {
        Process process = Process.GetProcessById(13568);
        IntPtr processHandle = OpenProcess(PROCESS_WM_READ, false, process.Id);

        // Get the process start information
        ProcessStartInfo myProcessStartInfo = new ProcessStartInfo("BingDict");
        // Assign 'StartInfo' of notepad to 'StartInfo' of 'process' object.
        process.StartInfo = myProcessStartInfo;
        //process.Start();
        System.Threading.Thread.Sleep(1000);
        ProcessModule myProcessModule;
        // Get all the modules associated with the process
        ProcessModuleCollection myProcessModuleCollection = process.Modules;
        Console.WriteLine("Base addresses of the modules associated are:");
        // Display the 'BaseAddress' of each of the modules.
        for (int i = 0; i < myProcessModuleCollection.Count; i++)
        {
            myProcessModule = myProcessModuleCollection[i];
            Console.WriteLine(myProcessModule.ModuleName + " : "
                + myProcessModule.BaseAddress);
        }
        // Get the main module associated with the process
        myProcessModule = process.MainModule;
        // Display the 'BaseAddress' of the main module.
        Console.WriteLine("The process's main module's base address is: {0:X4}",
            (int)myProcessModule.BaseAddress);

        var ptr = (int)myProcessModule.BaseAddress;

        for (int i = 1; i < 129; i++)
        {
            int bytesRead = 0;
            byte[] buffer = new byte[1];

            try
            {
                if (ReadProcessMemory((int)processHandle, ptr, buffer, buffer.Length, ref bytesRead))
                {
                    Console.WriteLine(buffer[0]);
                }                  
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }
        Console.ReadLine();
    }

}

0 个答案:

没有答案