使用ApacheTika检测文件扩展名会损坏文件

时间:2018-08-07 05:49:20

标签: java apache inputstream apache-tika

我正在尝试检测作为InputStream传递的文件的文件扩展名,可以正确检测到该扩展名,但此后该文件往往会损坏。这是我检测扩展的方法-

public static Optional<String> detectFileExtension(InputStream inputStream) {

    // To provide mark/reset functionality to the stream required by Tika.
    InputStream bufferedInputStream = new BufferedInputStream(inputStream);

    String extension = null;
    try {
        MimeTypes mimeRepository = getMimeRepository();

        MediaType mediaType = mimeRepository.detect(bufferedInputStream, new Metadata());
        MimeType mimeType = mimeRepository.forName(mediaType.toString());
        extension = mimeType.getExtension();
        log.info("File Extension detected: {}", extension);

        // Need to reset input stream pos marker since it was updated while detecting the extension
        inputStream.reset();
        bufferedInputStream.close();

    } catch (MimeTypeException | IOException ignored) {
        log.error("Unable to detect extension of the file from the provided stream");
    }
    return Optional.ofNullable(extension);
}

private static MimeTypes getMimeRepository() {
    TikaConfig config = TikaConfig.getDefaultConfig();
    return config.getMimeRepository();
}

现在,当我尝试使用相同的InputStream像-

一样再次检测扩展名后保存此文件时,
byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);

Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
    if (extension.isPresent()) {
        fileName = fileName + extension.get();
    } else {
        log.warn("File: {} does not have a valid extension", fileName);         
    }
File file = new File("/tmp/" + fileName);
FileUtils.writeByteArrayToFile(file, documentContentByteArray);

它将创建一个文件,但文件已损坏。我猜想在detectFileExtension中消耗流之后,流没有正确重置。如果有人在这样做之前没有提供任何指导,那就太好了。

1 个答案:

答案 0 :(得分:1)

我通过一次又一次不使用相同的输入流来修复它。 我创建了一个新流以通过扩展名检测和初始流来创建文件。

byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);

//extension detection
InputStream extensionDetectionInputStream = new ByteArrayInputStream(documentContentByteArray);
Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
    if (extension.isPresent()) {
        fileName = fileName + extension.get();
    } else {
        log.warn("File: {} does not have a valid extension", fileName);
    }
extensionDetectionInputStream.close();

//File creation
File file = new File("/tmp/" + fileName);
FileUtils.writeByteArrayToFile(file, documentContentByteArray);

如果有更好的方法来重用相同的流,那就太好了,我很乐意接受该答案,现在,我将其标记为可接受的答案。