我有一个保存为.htm网页的Microsoft Word文档。以下是我的代码。我的问题是如何从文档中获取文本,并将其附加到字符串中。我注意到段落设置为标记<p class=MsoNormal>
所以任何建议。我要追加的字符串是documentText
String documentText = "";
FileInputStream fileInput = null;
BufferedInputStream myBuffer = null;
DataInputStream dataInput = null;
fileInput = new FileInputStream(selectedFile);
myBuffer = new BufferedInputStream(fileInput);
dataInput = new DataInputStream(myBuffer);
while (dataInput.available() != 0){
System.out.println(dataInput.readLine());
}