将DOCX转换为HTML,包括图像

时间:2014-04-11 06:47:15

标签: java html docx docx4j

我正在使用DOCX4J将DOCX转换为HTML。我已成功完成转换并获得了html格式。我将使用html格式将其嵌入EMAIL主体以发送电子邮件。但是我有一些问题列在下面....

  1. 无法在电子邮件正文中显示图像
  2. 丢失空间和子弹
  3. 请找到我写的代码,

    WordprocessingMLPackage wordMLPackage;
    wordMLPackage = Docx4J.load(new java.io.File(resourcePath2));
    HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
    htmlSettings.setImageDirPath(imageFolder + resourcePath2 + "_files"); 
    htmlSettings.setImageTargetUri(imageFolder +resourcePath2.substring(resourcePath2.lastIndexOf("/")+1) + "_files");
    htmlSettings.setWmlPackage(wordMLPackage);
    
    OutputStream os; 
    os = new ByteArrayOutputStream();
    Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
    Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_SAVE_FLAT_XML);
    DOCX = ((ByteArrayOutputStream)os).toString();
    

2 个答案:

答案 0 :(得分:2)

您可以在代码中添加如下内容

package tcg.doc.web.managedBeans;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;

@Component
@Scope("session")
@Qualifier("ConvertWord")


public class ConvertWord {
    private static final String docName = "TestDocx.docx";
    private static final String outputlFolderPath = "d:/";


    String htmlNamePath = "docHtml.html";
    String zipName="_tmp.zip";
    File docFile = new File(outputlFolderPath+docName);
    File zipFile = new File(zipName);




      public void ConvertWordToHtml() {

          try {

                // 1) Load DOCX into XWPFDocument
                InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
                System.out.println("InputStream"+doc);
                XWPFDocument document = new XWPFDocument(doc);

                // 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)
                XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;

                // Extract image
                String root = "target";
                File imageFolder = new File( root + "/images/" + doc );
                options.setExtractor( new FileImageExtractor( imageFolder ) );
                // URI resolver
                options.URIResolver( new FileURIResolver( imageFolder ) );


                OutputStream out = new FileOutputStream(new File(htmlPath()));
                XHTMLConverter.getInstance().convert(document, out, options);


                System.out.println("OutputStream "+out.toString());
            } catch (FileNotFoundException ex) {

            } catch (IOException ex) {

            } 
         }

      public static void main(String[] args) {
         ConvertWord cwoWord=new ConvertWord();
         cwoWord.ConvertWordToHtml();
         System.out.println();
    }



      public String htmlPath(){
        // d:/docHtml.html
          return outputlFolderPath+htmlNamePath;
      }

      public String zipPath(){
          // d:/_tmp.zip
          return outputlFolderPath+zipName;
      }

}

对于pom.xml上的maven依赖

<dependency>
  <groupId>fr.opensagres.xdocreport</groupId>
  <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
  <version>1.0.4</version>
</dependency>

或从Here

下载

答案 1 :(得分:0)

对于在电子邮件正文中工作的图片,我猜您需要使用数据URI或将其发布到网络可访问的位置。

在任何一种情况下,您都需要编写一个实现:

public interface ConversionImageHandler {

/**
 * @param picture 
 * @param relationship of the image 
 * @param part of the image, if it is an internal image, otherwise null
 * @return uri for the image we've saved, or null
 * @throws Docx4JException this exception will be logged, but not propagated
 */
public String handleImage(AbstractWordXmlPicture picture, Relationship relationship, BinaryPart part) throws Docx4JException;
}

并配置docx4j以将其与htmlSettings.setImageHandler一起使用。

您可以查看docx4j源代码中的一些现有实现,并利用AbstractConversionImageHandler中的帮助器方法(例如,如果您需要数据URI,则使用createEncodedImage)。