我对apache tika有一个奇怪的问题。当我首先获取文件类型然后解析我没有得到内容。代码: -
public static void main(String[] args) throws IOException, SAXException, TikaException {
File file = new File("sample.txt");
InputStream is = new FileInputStream(file);
TikaInputStream objectData = TikaInputStream.get(is);
Parser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
TikaConfig config = TikaConfig.getDefaultConfig();
Detector detector = config.getDetector();
System.out.println("Hello : " + detector.detect(objectData, metadata).toString());
parser.parse(is, handler, metadata, context);
System.out.println("File Content :" + handler.toString());
}
但是当我先解析然后获取文件类型时,我得到了正确的content.code: -
public static void main(String[] args) throws IOException, SAXException, TikaException {
File file = new File("sample.txt");
InputStream is = new FileInputStream(file);
TikaInputStream objectData = TikaInputStream.get(is);
Parser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
parser.parse(is, handler, metadata, context);
System.out.println("File Content :" + handler.toString());
metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
TikaConfig config = TikaConfig.getDefaultConfig();
Detector detector = config.getDetector();
System.out.println("Hello : " + detector.detect(objectData, metadata).toString());
}
为什么会这样?它有什么办法吗?因为我需要根据给定的mime类型操作文本。
编辑: - 我认为这是依赖性问题。如果tika检测在引用库方面有效,那么依赖性是什么?