使用Apache Commons Compress提取.tgz文件时出错

时间:2018-10-11 08:06:22

标签: scala file-io gzip apache-commons-compress

在下面的代码中,我试图使用Apache Commons Compress库提取一个.tgz文件,但是出现一个异常,告诉我它不喜欢文件格式。这是代码段,以下方法是名为FileUtils.scala的源文件的一部分:

def extractTGZ(from: File, outputPath: String): Unit = {
    var fileCount: Int = 0
    var dirCount: Int = 0
    print(s"Extracting files from ${from.getAbsolutePath}")
    val tais = new TarArchiveInputStream(new GzipCompressorInputStream(new BufferedInputStream(new FileInputStream(from))))
    Try {
      @tailrec
      def readTarArchiveEntry(entry: TarArchiveEntry): Unit = {
        println("Extracting file: " + entry.getName)

        // Create directories as required
        if (entry.isDirectory) {
          new File(outputPath + entry.getName).mkdirs
          dirCount += 1
        } else {
          val data = new Array[Byte](BUFFER_SIZE)
          val fos = new FileOutputStream(outputPath + entry.getName)
          val dest = new BufferedOutputStream(fos, BUFFER_SIZE)

          var count = tais.read(data, 0, BUFFER_SIZE)

          while (count != -1) {
            dest.write(data, 0, count)
            count = tais.read(data, 0, BUFFER_SIZE)
          }
          dest.close()
          fileCount += 1
        }
        if (fileCount % 1000 == 0) print(".")

        // Check if we have some more files in the compressed archive
        val nextEntry = tais.getNextEntry.asInstanceOf[TarArchiveEntry]
        if (nextEntry != null) readTarArchiveEntry(nextEntry) else ()
      }

      readTarArchiveEntry(tais.getNextEntry.asInstanceOf[TarArchiveEntry])

    } recover {
      case t: Throwable =>
        println(s"Unexpected exception occurred when de-compressing files ${t.getMessage}")
        if (tais != null) tais.close()
    }
    println("\n" + fileCount + " files and " + dirCount + " directories extracted to: " + outputPath)
  }

这是我跑步时遇到的错误!

Extracting files from /Users/joe/ml-projects/housing-classification-example-scala/datasets/housing/raw/housing.tgz[error] (run-main-0) java.io.IOException: Input is not in the .gz format
[error] java.io.IOException: Input is not in the .gz format
[error]     at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:164)
[error]     at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.<init>(GzipCompressorInputStream.java:137)
[error]     at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.<init>(GzipCompressorInputStream.java:102)
[error]     at com.inland24.housingml.FileUtils$.extractTGZ(FileUtils.scala:20)
[error]     at com.inland24.housingml.FileUtils$.extractTGZ(FileUtils.scala:62)

关于为什么它不接受这种.tgz文件格式的任何线索?我的理解是.tgztar.gz一样好,并且GzipCompressorInputStream应该能够轻松处理两种格式!有什么想法吗?

0 个答案:

没有答案