来自tika的意外RuntimeException

时间:2016-03-15 13:36:05

标签: java apache parsing apache-tika

我试图提取包含混合文件的大型数据集的内容(pdfdocppt)。

我使用tika-app-1.12.jar,当T运行我的代码时,一切都完美无缺,然后我收到了此错误

Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@3ea25501  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at
recruitmentprototyp.RecruitmentPrototyp.tikareadDoc(RecruitmentPrototyp.java:135)
        at
recruitmentprototyp.RecruitmentPrototyp.doForAll(RecruitmentPrototyp.java:110)
        at
recruitmentprototyp.RecruitmentPrototyp.main(RecruitmentPrototyp.java:897)
Caused by: java.lang.IllegalStateException: Pap style 19 claimed to
have itself as its parent, which isn't allowed  at
org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:232)
        at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:120)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)       at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        ... 5 more Java Result: 1

我该怎么办?!!

0 个答案:

没有答案