Question

我已经测试了iTextsharp和iText7用于HTML到PDF的转换。根据性能，iTextsharp需要3分钟才能创建10000个PDF。但iText7需要17分钟才能创建10000 PDF。由于iText7是iTextsharp的新版本，我决定将iText7用于商业目的。但性能明智的iText7很低。所以请告诉我如何提高iText7中HTML到PDF转换的性能？

在iText7中进行测试

  For i As Integer = 0 To 10000 
        HTML = ReadFile '=> Read HTML file from particular location
        'HTML = Replace(HTML) => To Replace the content dynamically
         Dim writer As PdfWriter
          Dim array() As Byte = System.Text.Encoding.ASCII.GetBytes("a")
          writer = New PdfWriter(FileName, New WriterProperties().SetStandardEncryption(array, array, EncryptionConstants.ALLOW_PRINTING,
                            EncryptionConstants.ENCRYPTION_AES_256))
           HtmlConverter.ConvertToPdf(HTML, writer)
    Next

在iTextSharp中进行测试

   Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.pdfa
Imports System.IO
Imports iTextSharp.text.html.simpleparser
Imports System.Text
Imports iTextSharp.tool.xml.html
Imports iTextSharp.tool.xml
Imports iTextSharp.tool.xml.pipeline.html

 For i As Integer = 0 To 10000
    HTML = ReadFile '=> Read HTML file from particular location
        'HTML = Replace(HTML) => To Replace the content dynamically
    Dim bPDF As Byte()
        Dim ms As New MemoryStream
        Dim doc As Document
        doc = New Document(PageSize.A4, 25, 25, 25, 25)
        Dim txtReader As New StringReader(Html)   
        Dim oPdfWriter As PdfWriter
        oPdfWriter = PdfWriter.GetInstance(doc, ms)
        oPdfWriter.SetEncryption(iTextSharp.text.pdf.PdfWriter.ENCRYPTION_AES_128, "q", "a", 2)
        Dim htmlWorker As New HTMLWorker(doc)       
        doc.Open()
        htmlWorker.StartDocument()      
        htmlWorker.Parse(txtReader)
        htmlWorker.EndDocument()
        htmlWorker.Close()
        doc.Close()
        bPDF = ms.ToArray()
        Dim FIleName As String = "D:\ItextSharp_" & Now.ToString("ddMMyyyyHHMMssffffff") & ".pdf"
        File.WriteAllBytes(FIleName, bPDF)
Next



Function ReadFile()
        Dim stringReader As String = ""
        Dim objReader As New System.IO.StreamReader("D:\AS1-Revamp\TestHTML\test.html")
        Do While objReader.Peek() <> -1
            stringReader = stringReader & objReader.ReadLine() & vbNewLine
        Loop
        ReadFile = stringReader
End Function

我使用上面的代码测试性能... iText7 Tacking有更多时间将pdf文件放在提到的Path中与iTextSharp相比。

编辑：在另一个问题中复制/粘贴HTML：

基于路径iText7 Performance Issue Compared With iTextSharp中的我的问题，我已经为MR.Amedee Van Gasse发送了HTML文件。所以请告诉我如何提高iText7的性能..

<div id = "headerdiv" style="width:540px; float:left; background:#ededed; padding:30px; overflow:hidden;">
<br>
<br>
<br>
<div>
<img border='0' src='D:\AS1-Revamp\TestHTML\newlog.bmp' width='100' height='40'>
</div>
<p style="color:Red;align=center;" >                         Details</p>
<br>
<br>
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
SDetails
</font>
</td>
</td>
</tr>
<tr border='0'>
<td>
<div id="dvKYC">
<table  border='1'>

<tr>
<td><#lsName#></td>
<td>No:<#lsno#></td>
</tr> 

<tr  border='1'>
<td width=500><#lsAddess#></td>
<td></td>
</tr>

<tr>
<td><#lsContacts#></td>
<td> </td>
</tr> 
</table>
</div>
</td>
</tr>
</table>

<br>

<div >
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
Status
</font>
</td>
</td>
</tr>
</table>
<table style="width:100%;">
<tr  bgcolor=gray >
<td style="width:30%;text-align: left; font-weight: bold;">UUH  </td>
<td style="width:20%;text-align: left; font-weight: bold;">PN</td>
<td style="width:20%;text-align: left; font-weight: bold;">KC </td>
<td style="width:20%;text-align: left; font-weight: bold;">CC</td>
</tr>
<tr>
<td  style"width:200px;"><#lsHs#></td>
<td ><#lsPN#></td>
<td><#lsKC#></td>
<td><#lsCC#></td>
</tr>
</table>
 </div>



<div >
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
STD
</font>
</td>
</td>
</tr>
</table>


 <##TT##>


</div>

我已经应用以下代码后，两个错误来自ConverterProperties

1.setCreateAcroForm不是iText.Html2pdf.ConverterProperties的成员

2.setOutlineHandler不是iText.Html2pdf.ConverterProperties的成员

 Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
           Dim converterProperties As ConverterProperties = New ConverterProperties
            With converterProperties
                .SetBaseUri(".")
                .setCreateAcroForm(False)
                .SetCssApplierFactory(New DefaultCssApplierFactory())
                .SetFontProvider(New DefaultFontProvider())
                .SetMediaDeviceDescription(MediaDeviceDescription.CreateDefault())
                .setOutlineHandler(New OutlineHandler())
                .SetTagWorkerFactory(New DefaultTagWorkerFactory())
            End With
Dim HTML = ReadFile("Input_Template")
            For i = 0 To 10000
                LicenseKey.LoadLicenseFile("C:\iText7\itextkey-0.xml")
                Dim PDF = "E:\iText\testpdf " & i & ".pdf"
                Dim m As New MemoryStream
                Dim writer As PdfWriter
                Dim array() As Byte = System.Text.Encoding.ASCII.GetBytes("a")
                writer = New PdfWriter(PDF, New WriterProperties().SetStandardEncryption(array, array, EncryptionConstants.ALLOW_PRINTING,
                                  EncryptionConstants.ENCRYPTION_AES_256))
                HtmlConverter.ConvertToPdf(HTML, writer, converterProperties)
            Next
        End Sub

如果我注释那两行代码并运行我的程序一个错误出现在转换器代码行中，即（HtmlConverter.ConvertToPdf（HTML，writer，converterProperties））

错误是：＆＃34; Pdf间接对象属于其他PDF文档。将对象复制到当前的pdf文档。＆＃34;

由于coverterproperties处于循环外，这个错误就出现了。如果我将所有属性放在循环中它可以正常工作......但这对于性能明智是否正确..？

请帮我解决这三个错误..？

Answer 1

The answer to your question is simple: at iText Group, we are constantly improving the iText software, and there is certainly room for improving the performance. However, we won't ever be able to make the pdfHTML add-on as fast as the obsolete HTMLWorker. The reason is simple: HTMLWorker didn't support CSS, HTMLWorker only supported a small selection of tags, and so on... HTMLWorker was very simple and was only to be used for simple needs.

We have created the pdfHTML add-on to support CSS (including functionality to add headers, footer, page numbers, etc...). We support plenty of HTML tags that weren't supported in HTMLWorker. We support absolute positioning of elements in pdfHTML. All of this functionality comes with a cost. That cost is CPU.

It is intellectually unfair of you to compare the CPU use by HTMLWorker with the CPU use by pdfHTML.

This being said: you can already save plenty of time by using ConverterProperties. Right now, you don't provide any ConverterProperties. This means that iText has to instantiate the default properties for every PDF you are creating. If you would create the ConverterProperties up-front, and reuse them, you could already save plenty of time, but you have to understand that the extra functionality provided by pdfHTML comes with a cost in CPU.

This is how you create a ConverterProperties instance:

ConverterProperties converterProperties = new ConverterProperties()
    .setBaseUri(".")
    .setCreateAcroForm(false)
    .setCssApplierFactory(new DefaultCssApplierFactory())
    .setFontProvider(new DefaultFontProvider())
    .setMediaDeviceDescription(MediaDeviceDescription.createDefault())
    .setOutlineHandler(new OutlineHandler())
    .setTagWorkerFactory(new DefaultTagWorkerFactory());

As you can see, we create plenty of default objects: the default CCS Applier factory, the default font provider, the default media description, the default outline handler, and the default tag worker factory. The creation of all of these objects costs a tiny little bit of time, but when you multiply that time by 10,000 because you create 10,000 documents, the CPU needed to create those default objects can become significant, and that what happens when you convert an HTML file to PDF like this:

HtmlConverter.convertToPdf(
    new FileInputStream("resources/test.html"),
    new FileOutputStream("results/test.pdf"));

Since you are not adding a ConverterProperties parameter, iText will create a new instance of ConverterProperties internally for every document that you convert. All the default components of the ConverterProperties will be null, which means that for every document you create new instances of the CSS Applier factory, the font provider, etc... need to be created.

It will save you some time (but not that much) if you create a ConverterProperties up-front (only once), as well as all the components. It is then important that you reuse that object when converting HTML to PDF:

HtmlConverter.convertToPdf(
    new FileInputStream("resources/test.html"),
    new FileOutputStream("results/test.pdf"),
    converterProperties);

iText7性能问题与iTextSharp相比

1 个答案: