Question

我正在尝试使用itextsharp获取pdf文件的内容，如您所见：

static void Main(string[] args)
{
    StringBuilder text = new StringBuilder();
    using (PdfReader reader = new PdfReader(@"D:\a.pdf"))
    {
        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
        }
    }
    System.IO.File.WriteAllText(@"c:/a.txt",text.ToString());
    Console.ReadLine();
}

我的pdf内容是用Persian编写的，运行上面的代码后结果是这样的：

但这不是正确的结果。我应该在itextsharp

中设置任何选项

Answer 1

没有原始文件很难说，但如果你的字符/单词放错了，那么你应该尝试使用LocationTextExtractionStrategy这样：

text.Append(PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());

Itextsharp无法在c＃中提取pdf unicode内容

1 个答案: