我正在根据用户输入的HTML生成一个非常大的PDF(300 +页)。由于那里有一些很棒的样品,我的工作非常漂亮。我的下一个要求是生成一个动态目录,其中包含指向章节开始的PDF中的那些位置的内部链接。我有一部分工作部分。我可以创建有效的内部PDF链接。我需要帮助的部分是,页码是未知的。我已经尝试先创建主PDF然后旋转它以获取基于查找文本“第一章”的页码,但考虑到文档的大小和章节的数量,它太慢了。
添加到文档时是否可以检测当前页码?当我从HTML创建PDF时,我知道当我在新的章节时,但有没有办法向iTextSharp询问我们当前在哪个页面,所以我可以在我的目录中使用该号码?那样我可以在主文档旁边构建它然后合并它们?那里有更好的想法吗?
这是我从用户输入HTML生成PDF的方式:
Dim document As New Document()
Dim strManualFile As String = "file.pdf"
PdfWriter.GetInstance(document, New FileStream(strManualFile, FileMode.Create, FileAccess.Write, FileShare.ReadWrite))
document.Open()
Dim htmlarraylistBody As List(Of IElement) = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(New StringReader(GetManualHTML()), Nothing)
For l As Integer = 0 To htmlarraylistBody.Count - 1
document.Add(DirectCast(htmlarraylistBody(l), IElement))
Next
document.Close()
document.Dispose()
答案 0 :(得分:2)
PdfWriter.GetInstance()
返回一个对象,您可以查询该对象以查找当前页码,这是您应该知道的第一件事。如果您可以控制HTML,我会注入一个标志变量,您可以在For
循环中查看。如果找到标志变量,请执行某些操作,否则只需正常添加内容。
只是一个快速警告,HTMLWorker
已经被弃用了很长时间而且没有得到维护。所有工作都是在支持CSS的XmlWorker
库中完成的。如果由于许可证更改you should probably read this而使用旧版本而无法找到有关旧许可证的神话和事实。
下面是一个完整的工作示例,它显示了flag变量。在顶部,我创建了一些示例HTML,您明显删除它并替换为您的真实HTML。然后我创建一个标准文档并像你一样遍历每个项目。在循环内部,我检查标志变量,如果找到则存储它,否则就像你一样添加元素。
此代码的目标是iTextSharp 5.4.4。如果您使用的是旧版本的iTextSharp,那么Using
语句可能无效,只需将它们转换为Dim
语句并删除End Using
(或升级到最新版本)。请参阅代码以获取其他评论
''//File to write to
Dim TestFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf")
''//Create a flag value to search for. We won't write this to the PDF, it is just for searching.
Dim FlagValue = "!!UNIQUE TEXT!!"
''//Build our sample HTML. The real version of this would get the HTML from another source ideally.
Dim sampleHTML = <body/>
For I As Integer = 1 To 10
''//Just before inserting our chapter headings we insert our flag value appended with the current chapter number.
''//NOTE: This might need to be played with a little bit. I'm not sure if a new page is created by the previous entity
''// closing or the new entity starting.
sampleHTML.Add(String.Format("{0}{1}", FlagValue, I))
sampleHTML.Add(<h1><%= String.Format("Chapter {0}", I) %></h1>)
''//Add some some paragraphs
For J As Integer = 1 To 100
sampleHTML.Add(<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Suspendisse ac arcu porta, tempor justo eu, tincidunt eros.
Integer lorem dolor, pretium sit amet vehicula dapibus,
faucibus a tellus.</p>)
Next
Next
''//This will be our collection of chapter numbers and the actual page numbers that they correspond to.
Dim PageNumbers As New Dictionary(Of String, Integer)
''//Standard PDF setup here, nothing special
Using fs As New FileStream(TestFile, FileMode.Create, FileAccess.Write, FileShare.None)
Using doc As New Document()
Using writer = PdfWriter.GetInstance(doc, fs)
doc.Open()
''//Parse our HTML
Dim htmlarraylistBody = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(New StringReader(sampleHTML.ToString()), Nothing)
''//Loop through each item
For Each Elem In htmlarraylistBody
''//Some HTML elements freak the system out so you should check if they are content first.
If Elem.IsContent() Then
''//If the current element is a paragraph and start with our flag value
If (TypeOf Elem Is Paragraph) AndAlso DirectCast(Elem, Paragraph).Content.StartsWith(FlagValue) Then
''//Add that to our master collection but DO NOT write it to the PDF
PageNumbers.Add(DirectCast(Elem, Paragraph).Content.Replace(FlagValue, ""), writer.PageNumber)
Else
''//Otherwise just write to the PDF normally
doc.Add(Elem)
End If
End If
Next
doc.Close()
End Using
End Using
End Using