Am使用名为IronOCR的nuget库将pdf文件转换为Text。
下面是该代码
public ActionResult GetFileText(string filepath)
{
try
{
filepath = Server.MapPath(filepath);
string[] tokens = filepath.Split('.');
if (tokens[1] == "pdf")
{
var Ocr = new AdvancedOcr()
{
CleanBackgroundNoise = false,
ColorDepth = 4,
ColorSpace = AdvancedOcr.OcrColorSpace.Color,
EnhanceContrast = false,
DetectWhiteTextOnDarkBackgrounds = false,
RotateAndStraighten = false,
Language = IronOcr.Languages.English.OcrLanguagePack,
EnhanceResolution = false,
InputImageType = AdvancedOcr.InputTypes.Document,
ReadBarCodes = true,
Strategy = AdvancedOcr.OcrStrategy.Fast
};
var Results = Ocr.ReadPdf(filepath);
var Pages = Results.Pages;
var FullPdfText = Results.Text;
FullPdfText = HttpUtility.HtmlEncode(FullPdfText).ToString();
FullPdfText = "<p>" + FullPdfText;
FullPdfText = FullPdfText.Replace("\r\n", @"</p><p>");
FullPdfText = FullPdfText.Replace("\r", "<\r>");
ViewData["FileViewer"] = HttpUtility.HtmlEncode(FullPdfText);
return Content(FullPdfText);
}
else
{
var Ocr = new AutoOcr();
var Result = Ocr.Read(filepath);
return Content(Result.Text);
}
}
catch (Exception ex)
{
return Content(ex.Message.ToString());
}
}
仅在Visual Studio上运行时,此代码即可正常工作。但是,当您将整个解决方案托管到IIS时,它会抛出一个异常:“ IronOcr.ReadPdf-无法从PDF提取图像,它可能是加密的,不是有效的PDF文档,或者您可能没有足够的文件权限:“。 对于图像文件转换,它显示以下错误-“'NativeMagickSettings'的类型初始值设定项引发了异常。”该库似乎已付款,但使用的是试用版或演示版,这可能是原因吗?