Question

以下是我用于将word文档转换为pdf的代码。编译代码后，生成PDF文件。但该文件包含一些垃圾字符以及word文档内容。请帮我知道我应该做些什么修改来摆脱垃圾字符。我使用的代码是：

import com.lowagie.text.Document; 
import com.lowagie.text.Paragraph; 
import com.lowagie.text.pdf.PdfWriter; 
import java.io.File; 
import java.io.FileOutputStream; 



public class PdfConverter 
{

    private void createPdf(String inputFile, String outputFile)//, boolean isPictureFile) 
    {
        Document pdfDocument = new Document(); 
        String pdfFilePath = outputFile; 
        try
        {
            FileOutputStream fileOutputStream = new FileOutputStream(pdfFilePath); 
            PdfWriter writer = null; 
            writer = PdfWriter.getInstance(pdfDocument, fileOutputStream); 
            writer.open(); 
            pdfDocument.open(); 
            /*if (isPictureFile) 
            { 
            pdfDocument.add(com.lowagie.text.Image.getInstance(inputFile)); 
                } 
            else 
            { */
            File file = new File(inputFile); 
    pdfDocument.add(new Paragraph(org.apache.commons.io.FileUtils.readFileToString(file))); 
                //} 
            pdfDocument.close(); 
            writer.close(); 
            System.out.println("PDF has been generted"); 
            } 
            catch (Exception exception) 
            { 
            System.out.println("Document Exception!" + exception); 
            } 
            } 

    public static void main(String args[]) 
    { 
    PdfConverter pdfConversion = new PdfConverter(); 
    pdfConversion.createPdf("C:/test.doc", "C:/test.pdf");//, true); 

        }

    }

谢谢你的帮助。

Answer 1

只是因为你将你的类命名为PdfConverter而没有。您所做的只是将二进制内容作为字符串读取并将其写为一个段落（这就是您所看到的）。这种方法肯定不会成功。有关类似问题，请参阅https://stackoverflow.com/questions/437394。

如果您只对单词文档的内容感兴趣，可能需要尝试Apache POI - the Java API for Microsoft Documents尝试阅读不是二进制级别的文档，而是高级抽象级别。如果你的Word文档有一个简单的（我的意思是真的简单）结构，你可能会得到合理的结果。

Answer 2

为此，您必须正确读取doc文件，然后使用读取数据创建PDF文件。

您现在正在做的是您正在读取doc文件中的数据，因为您没有使用正确的API来读取数据，然后将获取的垃圾数据存储在PDF文件中，因此该文件具有垃圾值。因此问题。

使用iText将word文档转换为pdf时出错

2 个答案: