我正在尝试将包含表格和图像的docx
文件转换为pdf
格式文件。
我一直在各处搜索,但没有得到适当的解决方案,要求提供正确和正确的解决方案:
这里我尝试过:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class TestCon {
public static void main(String[] args) {
TestCon cwoWord = new TestCon();
System.out.println("Start");
cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
System.out.println("Done");
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
例外:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
at test.TestCon.ConvertToPDF(TestCon.java:31)
at test.TestCon.main(TestCon.java:25)
我的要求是创建一个java代码,将现有的docx转换为具有正确格式和对齐方式的pdf。
请建议。
使用的罐子:
答案 0 :(得分:17)
除了VivekRatanSinha answer之外,我还想为将来需要它的人发布完整的代码和必需的jar。
<强>代码:强>
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordConvertPDF {
public static void main(String[] args) {
WordConvertPDF cwoWord = new WordConvertPDF();
cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
和JARS:
享受:)
答案 1 :(得分:13)
你缺少一些图书馆。
我可以通过添加以下库来运行您的代码:
Apache POI 3.15 org.apache.poi.xwpf.converter.core-1.0.6.jar org.apache.poi.xwpf.converter.pdf-1.0.6.jar fr.opensagres.xdocreport.itext.extension-2.0.0.jar itext-2.1.7.jar ooxml-schemas-1.3.jar
我已成功转换了6页长的Word文档(.docx),其中包含表格,图像和各种格式。
答案 2 :(得分:9)
我将提供3种将docx转换为pdf的方法:
代码:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class ConvertDocToPdfitext {
public static void main(String[] args) {
System.out.println( "Starting conversion!!!" );
ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
System.out.println( "Ending conversion!!!" );
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
依赖关系:使用Maven解决依赖关系。
fr.opensagres.poi.xwpf.converter.core的新版本2.0.2与apache poi 4.0.1和itext 2.17一起运行。您只需要在Maven中添加以下依赖项,然后Maven将自动下载所有依赖项。 (更新了您的Maven项目,因此它下载了所有这些库及其所有依赖项)
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.2</version>
</dependency>
注意:您需要在运行此代码的计算机上安装MS Office。
代码:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
} 依赖关系:使用Maven解决依赖关系。
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
注意:您需要在运行此代码的计算机上安装OpenOffice。 代码:
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import ooo.connector.BootstrapSocketConnector;
public class App {
public static void main(String[] args) throws Exception, BootstrapException {
System.out.println("Stating conversion!!!");
// Initialise
String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
// Load the Document
String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted
if (!new File(myTemplate).canRead()) {
throw new RuntimeException("Cannot load template:" + new File(myTemplate));
}
XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
.queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
String sUrl = "file:///" + myTemplate;
PropertyValue[] propertyValues = new PropertyValue[0];
propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);
XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);
// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);
propertyValues = new PropertyValue[2];
// Setting the flag for overwriting
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
// Setting the filter name
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";
// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
xStorable.storeToURL("file:///" + myResult, propertyValues);
System.out.println("Saved " + myResult);
// shutdown
xDesktop.terminate();
}
}
依赖关系:使用Maven解决依赖关系。
<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>unoil</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>juh</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>bootstrap-connector</artifactId>
<version>0.1.1</version>
</dependency>
答案 3 :(得分:3)
如果您的文档非常丰富,并且您的选择是在Linux / Unix上进行转换,那么all three main options suggested in the thread的实现可能会有些痛苦。
我建议的解决方案是使用Gotenberg:由Docker支持的无状态API,用于将HTML,Markdown和Office文档转换为PDF。
$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6
curl
的方法:
$ curl --request POST \
--url http://localhost:3000/convert/office \
--header 'Content-Type: multipart/form-data' \
--form files=@document.docx \
--form files=@document2.docx \
-o result.pdf
将其部署到您的基础架构(例如,作为单独的微服务),并通过发出简单的HTTP请求的Java服务将其命中。在响应中获取您的PDF文件,然后执行您想要的操作。
经过测试,就像魅力一样!
答案 4 :(得分:0)
我使用此代码。
private byte[] toPdf(ByteArrayOutputStream docx) {
InputStream isFromFirstData = new ByteArrayInputStream(docx.toByteArray());
XWPFDocument document = new XWPFDocument(isFromFirstData);
PdfOptions options = PdfOptions.create();
//make new file in c:\temp\
OutputStream out = new FileOutputStream(new File("c:\\tmp\\HelloWord.pdf"));
PdfConverter.getInstance().convert(document, out, options);
//return byte array for return in http request.
ByteArrayOutputStream pdf = new ByteArrayOutputStream();
PdfConverter.getInstance().convert(document, pdf, options);
document.write(pdf);
document.close();
return pdf.toByteArray();
}
答案 5 :(得分:0)
您需要添加这些Maven依赖项
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-excelant</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-examples</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.core</artifactId>
<version>1.0.6</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
<version>1.0.6</version>
</dependency>
答案 6 :(得分:0)
我已经做了大量研究,发现Documents4j是将docx转换为pdf的最佳免费API。对齐方式,所有文档都能很好地完成字体工作。
Maven依赖项:
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
使用以下代码将docx转换为pdf。
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("Tests.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
}
答案 7 :(得分:0)
仅需手动添加一个依赖项(其他应自动添加)。截至目前,最新版本为2.0.2。
等级:
dependencies {
// What you should already have
implementation "org.apache.poi:poi-ooxml:latest.release"
// ADD THE BELOW LINE TO DEPENDENCIES BLOCK IN build.gradle
implementation 'fr.opensagres.xdocreport:fr.opensagres.poi.xwpf.converter.pdf:2.0.2'
}
仅行家:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.2</version>
</dependency>
答案 8 :(得分:0)
Docx4j 是开源,是将 Docx 转换为 pdf 且没有任何对齐或字体问题的最佳 API。
Maven 依赖:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-Internal</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-MOXy</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>8.0.0</version>
</dependency>
代码:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
public class DocToPDF {
public static void main(String[] args) {
try {
InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
FileOutputStream os = new FileOutputStream(outputfilepath);
Docx4J.toPDF(wordMLPackage,os);
os.flush();
os.close();
} catch (Throwable e) {
e.printStackTrace();
}
}
}
答案 9 :(得分:0)
我有一个非常复杂的文档,Microsoft Graph Api 帮助生成了正确的 pdf,格式与输入 Docx 中的格式完全相同
以下链接有助于完成应用注册以设置 API 密钥/秘密以访问图形 API https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4
https://devzigma.com/java/upload-files-to-sharepoint-using-java/
docx 文档上传后,请查看此文档以下载同一 docx 的 pdf 版本:https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=java