使用ITextSharp解析XHTML时,跳过将空表添加到PDF

时间:2014-06-30 19:14:43

标签: pdf pdf-generation itextsharp itext html-table

当您尝试创建包含0列的PdfTable时,ITextSharp会抛出错误。

我需要采用使用XSLT转换生成的XHTML并从中生成PDF。目前我正在使用ITextSharp这样做。我遇到的问题是生成的XHTML有时包含0行的表,所以当IT​​extSharp尝试将它们解析成表时,它会抛出错误,表示表中有0列。

它表示0列的原因是因为ITextSharp将表中的列数设置为每行中列数的最大值,并且由于没有行,因此任何给定行中的最大列数为0。

如何使用0行捕获这些HTML表声明并阻止它们被解析为PDF元素?

我发现导致错误的代码片段在HtmlPipeline中,因此我可以将实现复制并粘贴到扩展HtmlPipeline并覆盖其方法的类中,然后执行我的逻辑来检查空表那里,但这似乎草率和低效。

有没有办法在解析空表之前捕获空表?

=液=

标签处理器

public class EmptyTableTagProcessor : Table
{
    public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
    {
        if (currentContent.Count > 0)
        {
            return base.End(ctx, tag, currentContent);
        }

        return new List<IElement>();
    }
}

使用标签处理器......

        //CSS
        var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);

        //HTML
        var fontProvider = new XMLWorkerFontProvider();
        var cssAppliers = new CssAppliersImpl(fontProvider);

        var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
        tagProcessorFactory.AddProcessor(new EmptyTableTagProcessor(), new string[] { "table" });

        var htmlContext = new HtmlPipelineContext(cssAppliers);
        htmlContext.SetTagFactory(tagProcessorFactory);

        //PIPELINE
        var pipeline =
            new CssResolverPipeline(cssResolver,
            new HtmlPipeline(htmlContext,
            new PdfWriterPipeline(document, pdfWriter)));

        //XML WORKER
        var xmlWorker = new XMLWorker(pipeline, true);

        using (var stringReader = new StringReader(html))
        {
            xmlParser.Parse(stringReader);
        }

此解决方案删除空表标记,并仍将PDF作为管道的一部分写入。

1 个答案:

答案 0 :(得分:2)

您应该能够编写自己的标记处理器,通过继承iTextSharp.tool.xml.html.AbstractTagProcessor来解释该方案。事实上,为了让您的生活更轻松,您可以继承现有的更具体的iTextSharp.tool.xml.html.table.Table

public class TableTagProcessor : iTextSharp.tool.xml.html.table.Table {

    public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent) {
        //See if we've got anything to work with
        if (currentContent.Count > 0) {
            //If so, let our parent class worry about it
            return base.End(ctx, tag, currentContent);
        }

        //Otherwise return an empty list which should make everyone happy
        return new List<IElement>();
    }
}

不幸的是,如果您想使用自定义标记处理器,则无法使用快捷方式XMLWorkerHelper类,而是需要将HTML解析为元素并将其添加到文档中。要做到这一点,您需要一个iTextSharp.tool.xml.IElementHandler的实例,您可以创建如下:

public class SampleHandler : iTextSharp.tool.xml.IElementHandler {
    //Generic list of elements
    public List<IElement> elements = new List<IElement>();
    //Add the supplied item to the list
    public void Add(IWritable w) {
        if (w is WritableElement) {
            elements.AddRange(((WritableElement)w).Elements());
        }
    }
}

您可以使用以上代码,其中包含一些示例无效HTML。

//Hold everything in memory
using (var ms = new MemoryStream()) {

    //Create new PDF document 
    using (var doc = new Document()) {
        using (var writer = PdfWriter.GetInstance(doc, ms)) {

            doc.Open();

            //Sample HTML
            string html = "<table><tr><td>Hello</td></tr></table><table></table>";

            //Create an instance of our element helper
            var XhtmlHelper = new SampleHandler();

            //Begin pipeline
            var htmlContext = new HtmlPipelineContext(null);

            //Get the default tag processor
            var tagFactory = iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory();

            //Add an instance of our new processor
            tagFactory.AddProcessor(new TableTagProcessor(), new string[] { "table" });

            //Bind the above to the HTML context part of the pipeline
            htmlContext.SetTagFactory(tagFactory);

            //Get the default CSS handler and create some boilerplate pipeline stuff
            var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
            var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new ElementHandlerPipeline(XhtmlHelper, null)));//Here's where we add our IElementHandler

            //The worker dispatches commands to the pipeline stuff above
            var worker = new XMLWorker(pipeline, true);

            //Create a parser with the worker listed as the dispatcher
            var parser = new XMLParser();
            parser.AddListener(worker);

            //Finally, parse our HTML directly.
            using (TextReader sr = new StringReader(html)) {
                parser.Parse(sr);
            }

            //The above did not touch our document. Instead, all "proper" elements are stored in our helper class XhtmlHelper
            foreach (var element in XhtmlHelper.elements) {
                //Add these to the main document
                doc.Add(element);
            }

            doc.Close();

        }
    }
}