Question

我正在编写一个使用OCR（tessnet2）来扫描图像文件并提取某些信息的程序。在我发现我将从Exchange服务器扫描PDF附件之前，这很容易。

我正在处理的第一个问题是如何将我的PDF转换为BMP文件。从目前为止我所知道的TessNet2来看，它只能读取图像文件 - 特别是BMP。所以我现在的任务是将不确定大小的PDF（2 - 15页）转换为BMP图像。完成后，我可以使用我已经使用TessNet2构建的代码轻松扫描每个图像。

我已经看到使用Ghostscript执行此任务的事情 - 我只是想知道是否有另一个免费解决方案，或者如果你们其中一个好人类可以给我一个关于如何使用Ghostscript执行此操作的速成课程。

Answer 1

找到有关将PDF转换为图像的CodeProject文章：

http://www.codeproject.com/Articles/57100/Simple-and-Free-PDF-to-Image-Conversion

Answer 2

您也可以使用ImageMagick。而且完全免费！没有试用或付款。

只需从here下载ImageMagick .exe。安装它并下载here中的NuGet文件。

有代码！希望我能帮上忙！（即使这个问题是6年前提出的...）

程序：

     using ImageMagick;
     public void PDFToBMP(string output)
     {
        MagickReadSettings settings = new MagickReadSettings();
        // Settings the density to 500 dpi will create an image with a better quality
        settings.Density = new Density(500);

        string[] files= GetFiles();
        foreach (string file in files)
        {
            string fichwithout = Path.GetFileNameWithoutExtension(file);
            string path = Path.Combine(output, fichwithout);
            using (MagickImageCollection images = new MagickImageCollection())
            {
                images.Read(fich);
                foreach (MagickImage image in images)
                {
                    settings.Height = image.Height;
                    settings.Width = image.Width;
                    image.Format = MagickFormat.Bmp; //if you want to do other formats of image, just change the extension here! 
                    image.Write(path + ".bmp"); //and here!
                }
            }
        }
    }

功能GetFiles()：

    public string[] GetFiles()
    {
        if (!Directory.Exists(@"your\path"))
        {
            Directory.CreateDirectory(@"your\path");
        }

        DirectoryInfo dirInfo = new DirectoryInfo(@"your\path");
        FileInfo[] fileInfos = dirInfo.GetFiles();
        ArrayList list = new ArrayList();
        foreach (FileInfo info in fileInfos)
        {
            if(info.Name != file)
            {
                // HACK: Just skip the protected samples file...
                if (info.Name.IndexOf("protected") == -1)
                    list.Add(info.FullName);
            }

        }
        return (string[])list.ToArray(typeof(string));
    }

c＃PDF到Bmp是免费的

2 个答案: