从中间偏移

时间:2018-02-26 07:33:19

标签: scala akka gzip akka-stream

我正在使用Akka流式传输来自 AWS S3存储桶 .gz 文件,该应用程序会读取数据块并对其进行处理。

当我从头开始读取文件时,以下代码工作正常,但如果我从特定偏移量启动文件,则解压缩失败并抛出ParsingException。

代码:

object StreamApp extends App {

    // Run 1
    //startStreaming(0)
    // Run 2
    startStreaming(33339)

    def startStreaming(pointer: Long): Unit = {
        println("Stream App Starting")

        implicit val system = ActorSystem()
        val materializerSettings = ActorMaterializerSettings(system)
        implicit val materializer = ActorMaterializer(materializerSettings)
        implicit val dispatcher: ExecutionContextExecutor = system.dispatcher
        implicit val timeout: Timeout = Timeout(1 seconds)

        val accessKey = "aws-access-key"
        val secretAccessKey = "aws-secret-access-key"

        val awsCredentials = new AWSStaticCredentialsProvider(
            new BasicAWSCredentials(accessKey, secretAccessKey)
        )

        val s3Region: String = "s3-region-name"
        val s3Bucket: String = "s3-bucket-name"
        val s3DataFile: String = "s3-object-path.gz"

        val settings = new S3Settings(MemoryBufferType, None, awsCredentials, s3Region, false)

        val s3Client = new S3Client(settings)(system, materializer)

        val currentOffset: Long = pointer
        val source: Source[ByteString, NotUsed] =
            s3Client
                .download(s3Bucket, s3DataFile, ByteRange.fromOffset(currentOffset))

        val flowDecompress: Flow[ByteString, ByteString, NotUsed] =
            Flow[ByteString].via(
                Compression.gunzip()
            )

        val flowToString: Flow[ByteString, String, NotUsed] =
            Flow[ByteString].map(_.utf8String)

        val sink: Sink[String, Future[Done]] = Sink.foreach(println)

        val (killSwitch, graph) =
            source
                .via(flowDecompress)
                .via(flowToString)
                .viaMat(KillSwitches.single)(Keep.right)
                .toMat(sink)(Keep.both)
                .run()

        graph.onComplete {
            case Success(_) =>
                println("Stream App >> File Data Extractor >> Stream completed successfully")
                killSwitch.shutdown()
        }
    }
}

例外情况如下:

[ERROR] [02/26/2018 12:13:10.682] [default-akka.actor.default-dispatcher-2] [akka.dispatch.Dispatcher] Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
scala.MatchError: Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
    at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
    at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

我调试代码,以下是我的观察

对此line执行, readByte()返回3,因此无法理解输入是.gz格式

我无法解压缩.gz文件的数据块,我想要解决这个问题的指导,如果可能的话请提供指示并纠正我,如果我错了。

版本:

  

Akka Version =" 2.5.9"

     

Akka Http Version =" 10.0.11"

     

Alpakka Version =" 0.14"

1 个答案:

答案 0 :(得分:0)

docs中:Compression.gunzip()将创建Flow,该Flow解压缩gzip压缩的数据流。

您可以在Linux或MacOS上使用 file 实用程序来查看使用了哪种算法来压缩文件,在我的情况下,文件是使用zlib压缩的,因此Compression.inflate()就可以了。

另请参阅this question,以获取有关zlib,zip和gzip如何关联的很好解释。