使用itext生成pdf:一些未在HTMLWorker中显示的捷克字符解析了段落

时间:2016-04-28 13:48:00

标签: java pdf itext special-characters

我们使用的是itext 2.1.7。

我们有一个嵌入式富文本编辑器(CKEditor),其内容(html)存储在数据库中。编辑器允许格式化内容(粗体,斜体)。

我们使用HTMLWorker.parseToList方法基于这些html内容生成pdf。它运行良好,并正确呈现格式化的内容。 除非某些变音符号格式为粗体或斜体(请参阅下面的捕获)。

重现失败行为的一些代码:

    ArrayList elements;
    Font diacriticReadyFont = FontFactory.getFont("/images/arial.ttf", BaseFont.IDENTITY_H, true);

    // Add one normally styled paragraph with Czech diacritics
    Paragraph p1 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p>A normal style paragraph with Czech diacritics shows fine : Č,Ć,&Scaron;,Ž,Đ</p>"), null);
    for (Object element : elements) {
        p1.add(element);
    }
    getDocument().add(p1);

    // Add one mixed style paragraph with standard characters
    Paragraph p2 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p>A paragraph with some <em>italic text </em>and <strong>bold text </strong>shows fine</p>"), null);
    for (Object element : elements) {
        p2.add(element);
    }
    getDocument().add(p2);

    // Add one bold style paragraph with Czech diacritics
    Paragraph p3 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><strong>However, bold text with Czech diacritics Č,Ć,&Scaron;,Ž,Đ will miss some of those diacritics</strong></p>"), null);
    for (Object element : elements) {
        p3.add(element);
    }
    getDocument().add(p3);

    // Add one italic style paragraph with Czech diacritics
    Paragraph p4 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><em>Also, italic text with Czech diacritics Č,Ć,&Scaron;,Ž,Đ will miss some too</em></p>"), null);
    for (Object element : elements) {
        p4.add(element);
    }
    getDocument().add(p4);

    // Forcing the font on "element" paragraphs does not help
    Paragraph p5 = new Paragraph("", diacriticReadyFont);
    elements = HTMLWorker.parseToList(new StringReader("<p><strong>Forcing the font on \"element\" paragraphs does not help : Č,Ć,&Scaron;,Ž,Đ</strong></p>"), null);
    for (Object element : elements) {
        ((Paragraph)element).setFont(diacriticReadyFont);
        p5.add(element);
    }
    getDocument().add(p5);

给出:

enter image description here

根据我的分析(在这篇优秀的帖子:Can't get Czech characters while generating a PDF的帮助下),似乎 HTMLWorker 自动应用于格式化(粗体或斜体)文本的字体是罪魁祸首。 如第5段示例所示,手动强制使用此字体无济于事。

有什么见解?

0 个答案:

没有答案