Question

我对apache tika有一个奇怪的问题。当我首先获取文件类型然后解析我没有得到内容。代码： -

public static void main(String[] args) throws IOException, SAXException, TikaException {

    File file = new File("sample.txt");

    InputStream is = new FileInputStream(file);
    TikaInputStream objectData = TikaInputStream.get(is);
    Parser parser = new AutoDetectParser();
    BodyContentHandler handler = new BodyContentHandler(-1);
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();

    metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
    TikaConfig config = TikaConfig.getDefaultConfig();
    Detector detector = config.getDetector();
    System.out.println("Hello : " + detector.detect(objectData, metadata).toString());

    parser.parse(is, handler, metadata, context);

    System.out.println("File Content :" + handler.toString());

}

但是当我先解析然后获取文件类型时，我得到了正确的content.code： -

public static void main(String[] args) throws IOException, SAXException, TikaException {

    File file = new File("sample.txt");

    InputStream is = new FileInputStream(file);
    TikaInputStream objectData = TikaInputStream.get(is);
    Parser parser = new AutoDetectParser();
    BodyContentHandler handler = new BodyContentHandler(-1);
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();

    parser.parse(is, handler, metadata, context);

    System.out.println("File Content :" + handler.toString());

    metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
    TikaConfig config = TikaConfig.getDefaultConfig();
    Detector detector = config.getDetector();
    System.out.println("Hello : " + detector.detect(objectData, metadata).toString());

}

为什么会这样？它有什么办法吗？因为我需要根据给定的mime类型操作文本。

编辑： - 我认为这是依赖性问题。如果tika检测在引用库方面有效，那么依赖性是什么？

Apache Tika内容问题

0 个答案: