如何使用所有必要的jar文件在java中逐行读取.doc文件?

时间:2014-12-31 01:51:22

标签: java apache file xmlbeans

我想逐行显示两个.doc文件之间的差异。我用.txt文件完成了它,它工作得很完美。为此,我使用了以下代码:

        FileReader File1Reader = new FileReader(File1.getPath());
        FileReader File2Reader = new FileReader(File2.getPath());

        // Create Buffered Object.
        BufferedReader File1BufRdr = new BufferedReader(File1Reader);
        BufferedReader File2BufRdr = new BufferedReader(File2Reader);

        // Get the file contents into String Variables.
        String File1Content = File1BufRdr.readLine();
        String File2Content = File2BufRdr.readLine();

        //New String Builder
        StringBuilder buffer = new StringBuilder();

有没有办法逐行阅读doc文件。 我正在使用以下代码从doc文件中读取,但这不是逐行的。这是代码:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class read_From_Doc_Docx {
    public static void main(String[] args) {

            //Alternate between the two to check what works.
        //String FilePath = "D:\\Users\\username\\Desktop\\Doc1.docx";
        String FilePath = "/Users/esna786/Removal of Redundancy.docx";
        FileInputStream fis;

        if (FilePath.substring(FilePath.length() - 1).equals("x")) { //is a docx
            try {
                fis = new FileInputStream(new File(FilePath).getAbsolutePath());
                XWPFDocument doc = new XWPFDocument(fis);
                XWPFWordExtractor extract = new XWPFWordExtractor(doc);
                System.out.println(extract.getText());
            } catch (IOException e) {

                e.printStackTrace();
            }
        } else { //is not a docx
            try {
                fis = new FileInputStream(new File(FilePath));
                HWPFDocument doc = new HWPFDocument(fis);
                WordExtractor extractor = new WordExtractor(doc);
                System.out.println(extractor.getText());
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

1 个答案:

答案 0 :(得分:1)

只需使用getParagraphText()方法而不是getText()。