Question

我目前有一个基于数据库中多行数据创建PDF的工作版本。对于数据库中的每一行，它都会在PDF中创建一个新页面。这一切都很有效。现在我需要解析每一行中的一些字段，以便正确呈现它们的HTML。我可以see an example here显示解析整个文档，虽然它需要整个字符串并解析文档。

我需要的是创建单独的格式化页面，只包含要解析的特定HTML字段。是否有可能做到这一点？

下面是我创建新页面的示例代码：

PdfFont fTimes = PdfFontFactory.CreateFont(FontConstants.TIMES_ROMAN);
PdfFont fTimesBold = PdfFontFactory.CreateFont(FontConstants.TIMES_BOLD);                    

// create the first page here
doc.Add(new Paragraph("Abstract Submissions for " + eventName).SetFont(fTimes).SetFontSize(18).SetFontColor(Color.BLACK));
doc.Add(new Paragraph("Section Name: " + GetSectionName(ddlSections.SelectedValue)).SetFont(fTimes).SetFontSize(14).SetFontColor(Color.BLACK));
doc.Add(new Paragraph("Created:  " + DateTime.Now.ToString("dddd, MMMM d, yyyy h:mm tt")).SetFont(fTimes).SetFontSize(11).SetFontColor(Color.BLACK));

// iterate through each of the items
foreach (DataRow row in dsItems.Tables[0].Rows)
{
    // create a new page for each abstract submission
    doc.Add(new AreaBreak(iText.Layout.Properties.AreaBreakType.NEXT_PAGE));
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationType"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK));
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationTitle"], "")).SetFont(fTimes).SetFontSize(16).SetFontColor(Color.BLACK));
    // html field
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Authors"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK));
    // html field
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Abstract"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK));
}

doc.Close();

我应该注意我使用的是MemoryStream与FileStream，因此客户端可以立即下载，而不需要保存在文件系统中。

**编辑 - 添加样本数据**

<table>
    <tr>
        <td>Poster</td>
        <td>Abstract 1</td>
        <td><strong><em>Doctor Name 1</em></strong> <strong>Doctor Name 2</strong></td>
        <td><p>Some really long text <strong>which can have</strong> some different basic HTML <u>formatting in it</u></p></td>
    </tr>
    <tr>
        <td>Presentation</td>
        <td>Abstract 2</td>
        <td><strong>Doctor Name 15 </strong><em>Doctor 3</em></td>
        <td><p>Some really long text which can have some different basic HTML <em>formatting in it</em></p></td>
    </tr>
</table>

Answer 1

使用这样的模式，您可以创建自己的xml / html到itext转换器。您只需要实现所需的标记：

internal interface ICustomElement { IEnumerable<IElement> GetContent(); }

internal class CustomElementFactory {
  public ICustomElement GetElement(XmlNode node) {
    switch (node.Name) {
      case "p": return new CustomParagraph (node, this);
      // implement the tags you need using the ICustomElement interface
      default: // e.g. treat unknown nodes as text
    }
}

public class PdfCreator {
  public byte[] GetPdf(XmlDocument template) {
    PdfDocument doc ...
    CustomElementFactory factory ...
    foreach(XmlNode node in template.ChildNodes) {
      doc.AddElements(factory.GetElement(node).GetContent()); 
      // the point why all this is possible in such an easy generic way is that almost every itext element implements the IElement interface and therefore can be added to the document this way. And containers like PdfPCell are taking IElements as well.
      // Good job itext guys! ;)
    }

    return doc.CloseDocument();
  }
}

// here comes the magic:

internal class CustomParagraph : ICustomElement {
  // ctor storing the xmlnode and factory in private field
  public IEnumerable<IElement> GetContent() {
    Paragraph p = new Paragraph();
    p.Add(node.InnerText); // create a underline or bold or whatever font here when you are implementing the special html tags

    // if the node has child elements, get their content by calling the factory.GetElement(child).GetContent() for each child. Then loop over the the IElement.Chunks collection of each IElement to add the containing chunks to the paragraph of this scope. This way you will be able to process nested html tags recursively.
    // find a way to pass the style information of this scope to the factory when processing child nodes, so you will be able to render <strong>bold<u>underlindANDBOLD</u></strong> stuff correctly

    return new List<IElement> { p };
  }
}

这需要一些工作和微调，但它可以完成。

在不同的页面上循环使用iText解析HTML块

1 个答案: