我正在解析Microsoft Word文档。我已经导入了Apache poi jar来读取Word文档。我想得到Word文档中的标题。我已经给出了标题的大小来过滤。
public void try1(POIFSFileSystem filestream) throws Exception
{
HWPFDocument doc = new HWPFDocument (filestream);
WordExtractor we = new WordExtractor(doc);
Range range = doc.getRange();
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++)
{
Paragraph pr = range.getParagraph(i);
int k = 0;
if(pr.text().trim().length()!=0)
{
while (true)
{
System.out.println(k);
CharacterRun run = pr.getCharacterRun(k++);
/*System.out.println("Word is "+pr.text());
System.out.println("Color: " + run.getColor());
System.out.println("Font: " + run.getFontName());
System.out.println("Font Size: " + run.getFontSize());*/
System.out.println(pr.text());
System.out.println(run.getEndOffset()+" "+pr.getEndOffset());
if(run.getFontSize()==26||run.getFontSize()==24)
{
System.out.println("Selected One is "+pr.text());
}
if (run.getEndOffset() == pr.getEndOffset())
break;
}
}
}
}
我得到了这个例外:
java.lang.IllegalArgumentException: The end (7905) must not be before the start (15721)
at org.apache.poi.hwpf.usermodel.Range.sanityCheckStartEnd(Range.java:247)
at org.apache.poi.hwpf.usermodel.Range.<init>(Range.java:181)
at org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:791)
at com.honeywell.corept.srd.ReadDocFileFromJava.try1(ReadDocFileFromJava.java:122)
at com.honeywell.corept.srd.ReadDocFileFromJava.main(ReadDocFileFromJava.java:24)
CharacterRun run = pr.getCharacterRun(k ++);这是java文件中的122行