在java中将docx转换为pdf

时间:2017-04-12 07:58:47

标签: java pdf ms-word apache-poi

我正在尝试将包含表格和图像的docx文件转换为pdf格式文件。

我一直在各处搜索,但没有得到适当的解决方案,要求提供正确和正确的解决方案:

这里我尝试过:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class TestCon {

    public static void main(String[] args) {
        TestCon cwoWord = new TestCon();
        System.out.println("Start");
        cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
    }

    public void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new XWPFDocument(doc);
            PdfOptions options = PdfOptions.create();
            OutputStream out = new FileOutputStream(new File(pdfPath));
            PdfConverter.getInstance().convert(document, out, options);
            System.out.println("Done");
        } catch (FileNotFoundException ex) {
            System.out.println(ex.getMessage());
        } catch (IOException ex) {

            System.out.println(ex.getMessage());
        }
    }

}

例外:

Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
at test.TestCon.ConvertToPDF(TestCon.java:31)
at test.TestCon.main(TestCon.java:25)

我的要求是创建一个java代码,将现有的docx转换为具有正确格式和对齐方式的pdf。

请建议。

使用的罐子:

Updated jars

10 个答案:

答案 0 :(得分:17)

除了VivekRatanSinha answer之外,我还想为将来需要它的人发布完整的代码和必需的jar。

<强>代码:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class WordConvertPDF {
    public static void main(String[] args) {
        WordConvertPDF cwoWord = new WordConvertPDF();
        cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
    }

    public void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new XWPFDocument(doc);
            PdfOptions options = PdfOptions.create();
            OutputStream out = new FileOutputStream(new File(pdfPath));
            PdfConverter.getInstance().convert(document, out, options);
        } catch (IOException ex) {
            System.out.println(ex.getMessage());
        }
    }
}

和JARS:

required jars

享受:)

答案 1 :(得分:13)

你缺少一些图书馆。

我可以通过添加以下库来运行您的代码:

    Apache POI 3.15
    org.apache.poi.xwpf.converter.core-1.0.6.jar
    org.apache.poi.xwpf.converter.pdf-1.0.6.jar
    fr.opensagres.xdocreport.itext.extension-2.0.0.jar
    itext-2.1.7.jar
    ooxml-schemas-1.3.jar

我已成功转换了6页长的Word文档(.docx),其中包含表格,图像和各种格式。

答案 2 :(得分:9)

我将提供3种将docx转换为pdf的方法:

  1. 使用itext,opensagres和apache poi

代码:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class ConvertDocToPdfitext {

    public static void main(String[] args) {
  System.out.println( "Starting conversion!!!" );
  ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
   cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
   System.out.println( "Ending conversion!!!" );
}

public void ConvertToPDF(String docPath, String pdfPath) {
    try {
        InputStream doc = new FileInputStream(new File(docPath));
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        OutputStream out = new FileOutputStream(new File(pdfPath));
        PdfConverter.getInstance().convert(document, out, options);
    } catch (IOException ex) {
        System.out.println(ex.getMessage());
    }
}

}

依赖关系:使用Maven解决依赖关系。

fr.opensagres.poi.xwpf.converter.core的新版本2.0.2与apache poi 4.0.1和itext 2.17一起运行。您只需要在Maven中添加以下依赖项,然后Maven将自动下载所有依赖项。 (更新了您的Maven项目,因此它下载了所有这些库及其所有依赖项)

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
    <version>2.0.2</version>
</dependency>
  1. 使用Documents4j

注意:您需要在运行此代码的计算机上安装MS Office。

代码:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

public static void main(String[] args) {

    File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
    File outputFile = new File("Test_out.pdf");
    try  {
        InputStream docxInputStream = new FileInputStream(inputWord);
        OutputStream outputStream = new FileOutputStream(outputFile);
        IConverter converter = LocalConverter.builder().build();         
        converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
        outputStream.close();
        System.out.println("success");
    } catch (Exception e) {
        e.printStackTrace();
    }
}

} 依赖关系:使用Maven解决依赖关系。

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>
  1. 使用openoffice nuoil

注意:您需要在运行此代码的计算机上安装OpenOffice。 代码:

import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;

import ooo.connector.BootstrapSocketConnector;

public class App {
    public static void main(String[] args) throws Exception, BootstrapException {
        System.out.println("Stating conversion!!!");
        // Initialise
        String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
        XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
        XMultiComponentFactory xMCF = xContext.getServiceManager();

        Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);

        XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);

        // Load the Document
        String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
        String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted

        if (!new File(myTemplate).canRead()) {
            throw new RuntimeException("Cannot load template:" + new File(myTemplate));
        }

        XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
                .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);

        String sUrl = "file:///" + myTemplate;

        PropertyValue[] propertyValues = new PropertyValue[0];

        propertyValues = new PropertyValue[1];
        propertyValues[0] = new PropertyValue();
        propertyValues[0].Name = "Hidden";


propertyValues[0].Value = new Boolean(true);

        XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);

        // save as a PDF
        XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);

        propertyValues = new PropertyValue[2];
        // Setting the flag for overwriting
        propertyValues[0] = new PropertyValue();
        propertyValues[0].Name = "Overwrite";
        propertyValues[0].Value = new Boolean(true);
        // Setting the filter name
        propertyValues[1] = new PropertyValue();
        propertyValues[1].Name = "FilterName";
        propertyValues[1].Value = "writer_pdf_Export";

        // Appending the favoured extension to the origin document name
        String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
        xStorable.storeToURL("file:///" + myResult, propertyValues);

        System.out.println("Saved " + myResult);

        // shutdown
        xDesktop.terminate();
    }
}

依赖关系:使用Maven解决依赖关系。

<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>unoil</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>juh</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>bootstrap-connector</artifactId>
        <version>0.1.1</version>
    </dependency>

答案 3 :(得分:3)

如果您的文档非常丰富,并且您的选择是在Linux / Unix上进行转换,那么all three main options suggested in the thread的实现可能会有些痛苦。

我建议的解决方案是使用Gotenberg由Docker支持的无状态API,用于将HTML,Markdown和Office文档转换为PDF。

  • 启动容器$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6
  • 向容器发出请求。这是使用curl的方法:
$ curl --request POST \
    --url http://localhost:3000/convert/office \
    --header 'Content-Type: multipart/form-data' \
    --form files=@document.docx \
    --form files=@document2.docx \
    -o result.pdf

将其部署到您的基础架构(例如,作为单独的微服务),并通过发出简单的HTTP请求的Java服务将其命中。在响应中获取您的PDF文件,然后执行您想要的操作。

经过测试,就像魅力一样!

答案 4 :(得分:0)

我使用此代码。

private byte[] toPdf(ByteArrayOutputStream docx) {
    InputStream isFromFirstData = new ByteArrayInputStream(docx.toByteArray());

    XWPFDocument document = new XWPFDocument(isFromFirstData);
    PdfOptions options = PdfOptions.create();

    //make new file in c:\temp\
    OutputStream out = new FileOutputStream(new File("c:\\tmp\\HelloWord.pdf"));
    PdfConverter.getInstance().convert(document, out, options);

    //return byte array for return in http request.
    ByteArrayOutputStream pdf = new ByteArrayOutputStream();
    PdfConverter.getInstance().convert(document, pdf, options);

    document.write(pdf);
    document.close();
    return pdf.toByteArray();
}

答案 5 :(得分:0)

您需要添加这些Maven依赖项

  <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-scratchpad</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml-schemas</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-excelant</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-examples</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>org.apache.poi.xwpf.converter.core</artifactId>
        <version>1.0.6</version>
    </dependency>

    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
        <version>1.0.6</version>
    </dependency>

答案 6 :(得分:0)

我已经做了大量研究,发现Documents4j是将docx转换为pdf的最佳免费API。对齐方式,所有文档都能很好地完成字体工作。

Maven依赖项:

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>

使用以下代码将docx转换为pdf。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

    public static void main(String[] args) {

        File inputWord = new File("Tests.docx");
        File outputFile = new File("Test_out.pdf");
        try  {
            InputStream docxInputStream = new FileInputStream(inputWord);
            OutputStream outputStream = new FileOutputStream(outputFile);
            IConverter converter = LocalConverter.builder().build();
            converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
            outputStream.close();
            System.out.println("success");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

答案 7 :(得分:0)

仅需手动添加一个依赖项(其他应自动添加)。截至目前,最新版本为2.0.2。

等级:

dependencies {
    // What you should already have
    implementation "org.apache.poi:poi-ooxml:latest.release"


    // ADD THE BELOW LINE TO DEPENDENCIES BLOCK IN build.gradle
    implementation 'fr.opensagres.xdocreport:fr.opensagres.poi.xwpf.converter.pdf:2.0.2'
}

仅行家:

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
    <version>2.0.2</version>
</dependency>

答案 8 :(得分:0)

Docx4j 是开源,是将 Docx 转换为 pdf 且没有任何对齐或字体问题的最佳 API。

Maven 依赖

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

代码

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

答案 9 :(得分:0)

我有一个非常复杂的文档,Microsoft Graph Api 帮助生成了正确的 pdf,格式与输入 Docx 中的格式完全相同

以下链接有助于完成应用注册以设置 API 密钥/秘密以访问图形 API https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4

https://devzigma.com/java/upload-files-to-sharepoint-using-java/

docx 文档上传后,请查看此文档以下载同一 docx 的 pdf 版本:https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=java