我们使用的是itext 2.1.7。
我们有一个嵌入式富文本编辑器(CKEditor),其内容(html)存储在数据库中。编辑器允许格式化内容(粗体,斜体)。
我们使用HTMLWorker.parseToList方法基于这些html内容生成pdf。它运行良好,并正确呈现格式化的内容。 除非某些变音符号格式为粗体或斜体(请参阅下面的捕获)。
重现失败行为的一些代码:
ArrayList elements;
Font diacriticReadyFont = FontFactory.getFont("/images/arial.ttf", BaseFont.IDENTITY_H, true);
// Add one normally styled paragraph with Czech diacritics
Paragraph p1 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p>A normal style paragraph with Czech diacritics shows fine : Č,Ć,Š,Ž,Đ</p>"), null);
for (Object element : elements) {
p1.add(element);
}
getDocument().add(p1);
// Add one mixed style paragraph with standard characters
Paragraph p2 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p>A paragraph with some <em>italic text </em>and <strong>bold text </strong>shows fine</p>"), null);
for (Object element : elements) {
p2.add(element);
}
getDocument().add(p2);
// Add one bold style paragraph with Czech diacritics
Paragraph p3 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><strong>However, bold text with Czech diacritics Č,Ć,Š,Ž,Đ will miss some of those diacritics</strong></p>"), null);
for (Object element : elements) {
p3.add(element);
}
getDocument().add(p3);
// Add one italic style paragraph with Czech diacritics
Paragraph p4 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><em>Also, italic text with Czech diacritics Č,Ć,Š,Ž,Đ will miss some too</em></p>"), null);
for (Object element : elements) {
p4.add(element);
}
getDocument().add(p4);
// Forcing the font on "element" paragraphs does not help
Paragraph p5 = new Paragraph("", diacriticReadyFont);
elements = HTMLWorker.parseToList(new StringReader("<p><strong>Forcing the font on \"element\" paragraphs does not help : Č,Ć,Š,Ž,Đ</strong></p>"), null);
for (Object element : elements) {
((Paragraph)element).setFont(diacriticReadyFont);
p5.add(element);
}
getDocument().add(p5);
给出:
根据我的分析(在这篇优秀的帖子:Can't get Czech characters while generating a PDF的帮助下),似乎 HTMLWorker 自动应用于格式化(粗体或斜体)文本的字体是罪魁祸首。 如第5段示例所示,手动强制使用此字体无济于事。
有什么见解?