使用替换字符的Docx到Pdf

时间:2017-05-04 10:30:15

标签: java linux pdf docx docx4j

我有一个包含中文字符和其他亚洲语言的docx文件。我可以在我的笔记本电脑上将docx文件完美地转换为PDF文件,并将中文字符正确嵌入到PDF中,但是当在Linux服务器上运行相同的代码作为可运行的jar时,中文字符将替换为#符号。 有人可以指导我这个问题吗? 感谢您的帮助。 java代码如下所示

getContentView().setOnTouchListener(...);

1 个答案:

答案 0 :(得分:1)

从docx4j的“入门”文档中复制:

docx4j can only use fonts which are available to it.

These fonts come from 2 sources:
•   those installed on the computer
•   those embedded in the document

Note that Word silently performs font substitution.  When you open an existing document in 
Word, and select text in a particular font, the actual font you see on the screen won't be 
the font reported in the ribbon if it is not installed on your computer or embedded in the 
document.  To see whether Word 2007 is substituting a font, go into Word Options 
> Advanced > Show Document Content and press the "Font Substitution" button.  

Word's font substitution information is not available to docx4j.  As a developer, you 3 
options:
•   ensure the font is installed or embedded
•   tell docx4j which font to use instead, or
•   allow docx4j to fallback to a default font

To embed a font in a document, open it in Word on a computer which has the font installed 
(check no substitution is occuring), and go to Word Options > Save > Embed Fonts in File.

If you want to tell docx4j to use a different font, you need to add a font mapping.  The 
FontMapper interface is used to do this.

On a Windows computer, font names for installed fonts are mapped 1:1 to the corresponding 
physical fonts via the IdentityPlusMapper. 

A font mapper contains Map<String, PhysicalFont>; to add a font mapping, as per the example in the ConvertOutPDF sample:
    // Set up font mapper
    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);

    // .. example of mapping font Times New Roman which doesn't have certain Arabic glyphs
    // eg Glyph "ي" (0x64a, afii57450) not available in font "TimesNewRomanPS-ItalicMT".
    // eg Glyph "ج" (0x62c, afii57420) not available in font "TimesNewRomanPS-ItalicMT".
    // to a font which does
    PhysicalFont font 
            = PhysicalFonts.get("Arial Unicode MS"); 
        // make sure this is in your regex (if any)!!!
    if (font!=null) {
        fontMapper.put("Times New Roman", font);
        fontMapper.put("Arial", font);
    }

You'll see the font names if you configure log4j debug level logging for
 org.docx4j.fonts.PhysicalFonts

如果你打开org.docx4j.fonts的登录,它应该告诉你丢失的字形。见https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/fonts/GlyphCheck.java