Question

我正在使用FreeTextBox.dll获取用户输入，并将该信息以HTML格式存储在数据库中。用户输入的结果如下：

                                                                     133 Peachtree St NE
                                                                     Atlanta, GA 30303
                                                                     404-652-7777

                                                                     Cindy Cooley
                                                                     www.somecompany.com
                                                                     Product Stewardship Mgr

9/9/2011
Deidre's Company
123 Test St
Atlanta, GA 30303

Test test.

我希望HTMLWorker能够持久保存用户输入的空白区域，但它会将其剥离。有没有办法坚持用户的空白区域？下面是我如何创建PDF文档的示例。

Public Shared Sub CreatePreviewPDF（ByVal vsHTML As String，ByVal vsFileName As String）

        Dim output As New MemoryStream()
        Dim oDocument As New Document(PageSize.LETTER)
        Dim writer As PdfWriter = PdfWriter.GetInstance(oDocument, output)
        Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)

        Using output
            Using writer
                Using oDocument
                    oDocument.Open()
                    Using sr As New StringReader(vsHTML)
                        Using worker As New html.simpleparser.HTMLWorker(oDocument)

                            worker.StartDocument()
                            worker.SetInsidePRE(True)
                            worker.Parse(sr)
                            worker.EndDocument()
                            worker.Close()
                            oDocument.Close()

                        End Using
                    End Using

                    HttpContext.Current.Response.ContentType = "application/pdf"
                    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
                    HttpContext.Current.Response.BinaryWrite(output.ToArray())
                    HttpContext.Current.Response.End()

                End Using
            End Using
            output.Close()
        End Using


    End Sub

Answer 1

iText和iTextSharp有一个小故障，但如果你不介意下载源代码并重新编译它，你可以很容易地修复它。您需要更改两个文件。我所做的任何更改都会在代码中内联注释。行号基于5.1.2.0代码转240

第一个是iTextSharp.text.html.HtmlUtilities.cs。在第249行查找函数EliminateWhiteSpace并将其更改为：

    public static String EliminateWhiteSpace(String content) {
        // multiple spaces are reduced to one,
        // newlines are treated as spaces,
        // tabs, carriage returns are ignored.
        StringBuilder buf = new StringBuilder();
        int len = content.Length;
        char character;
        bool newline = false;
        bool space = false;//Detect whether we have written at least one space already
        for (int i = 0; i < len; i++) {
            switch (character = content[i]) {
            case ' ':
                if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
                    buf.Append(character);
                    space = true;  //flag that we just wrote a space
                }
                break;
            case '\n':
                if (i > 0) {
                    newline = true;
                    buf.Append(' ');
                }
                break;
            case '\r':
                break;
            case '\t':
                break;
            default:
                newline = false;
                space = false;  //reset flag
                buf.Append(character);
                break;
            }
        }
        return buf.ToString();
    }

第二项更改位于iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs。在第185行的函数Go中，将第248行更改为：

if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser

Answer 2

我建议使用wkhtmltopdf代替iText。 wkhtmltopdf将输出完全由webkit（谷歌浏览器，Safari）呈现的html而不是iText的转换。它只是一个你可以调用的二进制文件。话虽这么说，我可能会检查html以确保用户输入中有段落和/或换行符。它们可能会在转换之前被剥离。

Answer 3

感谢大家的帮助。通过执行以下操作，我能够找到一个小工作：

vsHTML.Replace("  ", "&nbsp;&nbsp;").Replace(Chr(9), "&nbsp;&nbsp;&nbsp;&nbsp;").Replace(Chr(160), "&nbsp;").Replace(vbCrLf, "<br />")

实际代码无法正常显示，但第一个替换是用 替换空格，Chr(9)替换为 ，Chr(160)替换为  }。

iTextSharp HTML到PDF保留空间

3 个答案: