ITextSharp PdfReader不读取不同PDF中的新文本

时间:2016-05-02 05:07:12

标签: vb.net pdf itextsharp

我有一个Windows服务应用程序,它使用ITextSharp读取PDF文本。我正在使用文本框来显示PDF文本。 它在读取第一个PDF时工作正常,但是当它读取第二个PDF时,文本不会改变,文本仍然是第一个PDF。这是我的代码:

dim vFileName as string
dim vFileEntries as string()
dim vPath as string = "C:\PDF"

if directory.exists(vPath) then
   vFileEntries = directory.getfiles(vPath)

  for each vFileName in vFileEntries
   dim PR as PdfReader = new PdfReader(vFileName)

    for CurrentPage as integer = 1 to PR.NumberOfPages
     RichTextBox1.text = ""
     dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
     dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage,  ltestrategy)

     RichTextBox1.Text = RichTextBox1.Text + currentext
     next
    PR.close()
  next vFileName
end if

感谢任何帮助

1 个答案:

答案 0 :(得分:0)

现在设置代码的方式,看起来RichTextBox1.text将包含最后一个被处理的pdf的最后一页的文本。以下更改将从您的文件夹中处理的所有pdf的所有页面中引入文本。

要实现这一目标,您需要更改以下内容:

for CurrentPage as integer = 1 to PR.NumberOfPages
     RichTextBox1.text = ""
     dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
     dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage,  ltestrategy)

     RichTextBox1.Text = RichTextBox1.Text + currentext
next

为:

for CurrentPage as integer = 1 to PR.NumberOfPages
     currenttext = ""
     dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
     dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage,  ltestrategy)

     RichTextBox1.Text = RichTextBox1.Text + currentext
next

您要重新初始化currentext而不是RichTextBox1.text。这将为您提供所有pdf及其所有页面的文本到文本框。