Question

这是我的代码：

Dim sr As StreamReader = New StreamReader(args(0))
Dim htmlStr As String = sr.ReadToEnd
sr.Close()

Using document As Document = New Document()
   Using writer As PdfWriter = PdfWriter.GetInstance(document, New FileStream("C:\Test\myfile.pdf", FileMode.Create))
      document.Open()
      XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, New StringReader(htmlStr)) <--Error here
   End Using
   document.Close()
End Using

任何人都可以帮我弄明白我该怎么做才能解决这个问题？我正在阅读的htm文件看起来很好，而且我无法改变文件本身。

Answer 1

错误消息说明出现了什么问题：错误出现在HTML中。

您在某个地方有<p>标记，然后是另一个结束标记（只有您可以告诉我们哪个标记）不是</p>。

例如：

这是正确的[1]：

<p>This is a paragraph<br />with a new line</p>

然而，这是不正确的[2]：

<p>This is a paragraph</br>with an incorrect new line</p>

这也无效[3]：

<b>Some bold text <p>inside a paragraph</b> that is not correctly nested.</p>

解析器会理解[1]，但抛出[2]或[3]时得到的错误。

使用iTextSharp将HTML转换为Pdf - 找到无效的嵌套p标记

1 个答案: