我正在使用OneDrive sdk从OneDrive下载.docx文件。下载成功,但是我需要将其转换为.txt格式,但是我做不到。
有人知道如何在android中从.docx文件转换或获取文本吗?
我可以获取{。{1}}的.docx文件。
这是从OneDrive下载文件的代码
InputStream
此代码已经在InputStream inputStream = iOneDriveClient.getDrive().getItems(fileID).getContent().buildRequest().get();
OutputStream out = new FileOutputStream(mPath);
int read;
byte[] bytes = new byte[1024];
while ((read = inputStream.read(bytes)) != -1) {
out.write(bytes, 0, read);
}
out.flush();
out.close();
inputStream.close();
编辑
我添加了Apache POI库,但无法编译
我在很多文件上遇到冲突
这是我的doInBackground
build.gradle
冲突错误是
在模块docx4j-6.1.1-SNAPSHOT-shaded.jar(docx4j-6.1.1-SNAPSHOT-shaded.jar)和jackson-core-2.9.6中找到的重复类com.fasterxml.jackson.core.Base64Variant。 jar(com.fasterxml.jackson.core:jackson-core:2.9.6)
答案 0 :(得分:0)
您可以使用Apache POI
来自文档:
对于Word 97-Word 2003中的.doc文件,在暂存器中有org.apache.poi.hwpf.extractor.WordExtractor,它将为您的文档返回文本。
以下是Google文档中的示例:
FileInputStream fis = new FileInputStream(inputFile);
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
// Firstly, get an extractor for the Workbook
POIOLE2TextExtractor oleTextExtractor =
ExtractorFactory.createExtractor(fileSystem);
// Then a List of extractors for any embedded Excel, Word, PowerPoint
// or Visio objects embedded into it.
POITextExtractor[] embeddedExtractors =
ExtractorFactory.getEmbededDocsTextExtractors(oleTextExtractor);
for (POITextExtractor textExtractor : embeddedExtractors) {
// If the embedded object was an Excel spreadsheet.
if (textExtractor instanceof ExcelExtractor) {
ExcelExtractor excelExtractor = (ExcelExtractor) textExtractor;
System.out.println(excelExtractor.getText());
}
// A Word Document
else if (textExtractor instanceof WordExtractor) {
WordExtractor wordExtractor = (WordExtractor) textExtractor;
String[] paragraphText = wordExtractor.getParagraphText();
for (String paragraph : paragraphText) {
System.out.println(paragraph);
}
// Display the document's header and footer text
System.out.println("Footer text: " + wordExtractor.getFooterText());
System.out.println("Header text: " + wordExtractor.getHeaderText());
}
// PowerPoint Presentation.
else if (textExtractor instanceof PowerPointExtractor) {
PowerPointExtractor powerPointExtractor =
(PowerPointExtractor) textExtractor;
System.out.println("Text: " + powerPointExtractor.getText());
System.out.println("Notes: " + powerPointExtractor.getNotes());
}
}