从Java读取MS doc图像

时间:2011-03-02 04:27:59

标签: java ms-word

我想使用Java读取嵌入在MS Word文档中的图像。我想重建图像文件。你能建议任何java库来完成这项任务吗?

1 个答案:

答案 0 :(得分:-1)

请参阅Apache POI 以使用Java API处理Microsoft Word文件

只是一个代码段。看看这个。

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;

public class readDoc
{
    public static void main( String[] args )
    {
        String filesname = "Hello.doc";
        POIFSFileSystem fs = null;
        try
        {
       fs = new POIFSFileSystem(new FileInputStream(filesname; 
      //Couldn't close the braces at the end as my site did not allow it to close

                  HWPFDocument doc = new HWPFDocument(fs);

          WordExtractor we = new WordExtractor(doc);

          String[] paragraphs = we.getParagraphText();

          System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
          for( int i=0; i<paragraphs .length; i++ ) {
            paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n","");
                    System.out.println( "Length:"+paragraphs[ i ].length());
          }
                }
                catch(Exception e) { 
                    e.printStackTrace();
                }