Java:Microsoft Word文档到html转换器样式表

时间:2013-04-16 04:20:17

标签: java doc apache-tika

根据需要,我尝试将doc或docx(Microsoft word)文件转换为html格式Apache tika

我最终得到以下代码,工作正常, 但它没有添加任何样式表来生成html。

 import javax.xml.transform.OutputKeys;
 import java.io.*;
 import javax.xml.transform.stream.StreamResult;
 import javax.xml.transform.sax.SAXTransformerFactory;
 import javax.xml.transform.sax.TransformerHandler;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.parser.AutoDetectParser;
 import org.apache.tika.parser.ParseContext;
 import org.apache.tika.detect.DefaultDetector;


public class DocxConvert

  {

  public static void main(String []args) 
   {
      InputStream input=null;

     try
        {
    StringWriter sw = new StringWriter();
            SAXTransformerFactory factory = (SAXTransformerFactory)
            SAXTransformerFactory.newInstance();
            TransformerHandler handler = factory.newTransformerHandler();
            handler.getTransformer().setOutputProperty(OutputKeys.METHOD,"html");
            handler.getTransformer().setOutputProperty(OutputKeys.INDENT,"yes");
            handler.setResult(new StreamResult(sw));
            input = new FileInputStream("f:\\file.doc");
            DefaultDetector detector = new DefaultDetector();
            Metadata metadata = new Metadata();
            org.apache.tika.parser.Parser parser = new AutoDetectParser(detector); 
            parser.parse(input, handler, metadata, new ParseContext());

            System.out.print(sw.toString());

        }
      catch (Exception ex)
   { 
        ex.printStackTrace();
   }
      finally {
              try {
            input.close();
          }
                  catch (IOException e)
                 {
            // TODO Auto-generated catch block
            e.printStackTrace();
          }
       } 

 }

}

有没有办法添加/生成样式表到输出?请帮助!

2 个答案:

答案 0 :(得分:0)

你可以使用unoconv,它需要Openoffice或Libreoffice。从here下载,它提供doc,docx,xl​​s等来从服务器命令行进行pdf转换。如果你想显示使用apache或apache tomcat嵌入pdf文件,我认为pdf.js是很好的解决方案。

答案 1 :(得分:0)

我使用了Tika的1.6版,这对我来说很好。这是我使用的pom依赖。

http://tika.apache.org/download.html

   <dependencies>
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parsers</artifactId>
            <version>1.6</version>
        </dependency>
    </dependencies>