从S3解压缩和读取gz文件-Scala

时间:2019-10-15 18:22:02

标签: java scala amazon-s3 gzip inputstream

我在S3文件夹中有一个gzip文件列表,必须使用scala读取文件。循环访问每个文件,并将文件内容存储在String缓冲区列表中。

这是读取一个文件并作为String返回的方法。

  def getDecompressedData(bucket: String, key: String) : String= {
     val getObjectRequest = new GetObjectRequest(bucket, key)
     val s3Object = s3Client.getObject(getObjectRequest)
     val byteArray = IOUtils.toByteArray(s3Object.getObjectContent)
     val inputStream = new GZIPInputStream(new ByteArrayInputStream(byteArray))
     val data = scala.io.Source.fromInputStream(inputStream).mkString
     inputStream.close()
     data
  }

我得到了错误

Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at com.amazonaws.util.IOUtils.toByteArray(IOUtils.java:44)
    at com.amazonaws.util.IOUtils.toString(IOUtils.java:58)

val data = scala.io.Source.fromInputStream(inputStream).mkString

1 个答案:

答案 0 :(得分:1)

def getDecompressedData(bucket: String, key: String) : String= {
     val getObjectRequest = new GetObjectRequest(bucket, key)
     val s3Object = s3Client.getObject(getObjectRequest)

     var data: String = ""

     // If S3 file is compressed
     if(gzip) {

        val gzipData = new Scanner(new GZIPInputStream(s3Object.getObjectContent)).asScala
        data = gzipData.mkstring

     } else {

        val plainText = new Scanner(new InputStreamReader(s3Object.getObjectContent)).asScala
        data = plainText.mkstring
    }

    s3Object.close()

    data
  }

我已经提供了gzip文件和纯文件的代码。