iTextSharp HTML到PDF保留空间

时间:2011-09-30 20:03:56

标签: html vb.net pdf itextsharp

我正在使用FreeTextBox.dll获取用户输入,并将该信息以HTML格式存储在数据库中。用户输入的结果如下:

                                                                     133 Peachtree St NE
                                                                     Atlanta,  GA 30303
                                                                     404-652-7777

                                                                     Cindy Cooley
                                                                     www.somecompany.com
                                                                     Product Stewardship Mgr

                                                                    9/9/2011
Deidre's Company
123 Test St
Atlanta, GA 30303

Test test.

 

我希望HTMLWorker能够持久保存用户输入的空白​​区域,但它会将其剥离。有没有办法坚持用户的空白区域?下面是我如何创建PDF文档的示例。

Public Shared Sub CreatePreviewPDF(ByVal vsHTML As String,ByVal vsFileName As String)

        Dim output As New MemoryStream()
        Dim oDocument As New Document(PageSize.LETTER)
        Dim writer As PdfWriter = PdfWriter.GetInstance(oDocument, output)
        Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)

        Using output
            Using writer
                Using oDocument
                    oDocument.Open()
                    Using sr As New StringReader(vsHTML)
                        Using worker As New html.simpleparser.HTMLWorker(oDocument)

                            worker.StartDocument()
                            worker.SetInsidePRE(True)
                            worker.Parse(sr)
                            worker.EndDocument()
                            worker.Close()
                            oDocument.Close()

                        End Using
                    End Using

                    HttpContext.Current.Response.ContentType = "application/pdf"
                    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
                    HttpContext.Current.Response.BinaryWrite(output.ToArray())
                    HttpContext.Current.Response.End()

                End Using
            End Using
            output.Close()
        End Using


    End Sub

3 个答案:

答案 0 :(得分:1)

iText和iTextSharp有一个小故障,但如果你不介意下载源代码并重新编译它,你可以很容易地修复它。您需要更改两个文件。我所做的任何更改都会在代码中内联注释。行号基于5.1.2.0代码转240

第一个是iTextSharp.text.html.HtmlUtilities.cs。在第249行查找函数EliminateWhiteSpace并将其更改为:

    public static String EliminateWhiteSpace(String content) {
        // multiple spaces are reduced to one,
        // newlines are treated as spaces,
        // tabs, carriage returns are ignored.
        StringBuilder buf = new StringBuilder();
        int len = content.Length;
        char character;
        bool newline = false;
        bool space = false;//Detect whether we have written at least one space already
        for (int i = 0; i < len; i++) {
            switch (character = content[i]) {
            case ' ':
                if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
                    buf.Append(character);
                    space = true;  //flag that we just wrote a space
                }
                break;
            case '\n':
                if (i > 0) {
                    newline = true;
                    buf.Append(' ');
                }
                break;
            case '\r':
                break;
            case '\t':
                break;
            default:
                newline = false;
                space = false;  //reset flag
                buf.Append(character);
                break;
            }
        }
        return buf.ToString();
    }

第二项更改位于iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs。在第185行的函数Go中,将第248行更改为:

if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser

答案 1 :(得分:0)

我建议使用wkhtmltopdf代替iText。 wkhtmltopdf将输出完全由webkit(谷歌浏览器,Safari)呈现的html而不是iText的转换。它只是一个你可以调用的二进制文件。话虽这么说,我可能会检查html以确保用户输入中有段落和/或换行符。它们可能会在转换之前被剥离。

答案 2 :(得分:0)

感谢大家的帮助。通过执行以下操作,我能够找到一个小工作:

vsHTML.Replace("  ", "&nbsp;&nbsp;").Replace(Chr(9), "&nbsp;&nbsp;&nbsp;&nbsp;").Replace(Chr(160), "&nbsp;").Replace(vbCrLf, "<br />")

实际代码无法正常显示,但第一个替换是用&nbsp;替换空格,Chr(9)替换为&nbsp;Chr(160)替换为&nbsp; }。