Question

我正在使用Akka流式传输来自 AWS S3存储桶的 .gz 文件，该应用程序会读取数据块并对其进行处理。

当我从头开始读取文件时，以下代码工作正常，但如果我从特定偏移量启动文件，则解压缩失败并抛出ParsingException。

代码：

object StreamApp extends App {

    // Run 1
    //startStreaming(0)
    // Run 2
    startStreaming(33339)

    def startStreaming(pointer: Long): Unit = {
        println("Stream App Starting")

        implicit val system = ActorSystem()
        val materializerSettings = ActorMaterializerSettings(system)
        implicit val materializer = ActorMaterializer(materializerSettings)
        implicit val dispatcher: ExecutionContextExecutor = system.dispatcher
        implicit val timeout: Timeout = Timeout(1 seconds)

        val accessKey = "aws-access-key"
        val secretAccessKey = "aws-secret-access-key"

        val awsCredentials = new AWSStaticCredentialsProvider(
            new BasicAWSCredentials(accessKey, secretAccessKey)
        )

        val s3Region: String = "s3-region-name"
        val s3Bucket: String = "s3-bucket-name"
        val s3DataFile: String = "s3-object-path.gz"

        val settings = new S3Settings(MemoryBufferType, None, awsCredentials, s3Region, false)

        val s3Client = new S3Client(settings)(system, materializer)

        val currentOffset: Long = pointer
        val source: Source[ByteString, NotUsed] =
            s3Client
                .download(s3Bucket, s3DataFile, ByteRange.fromOffset(currentOffset))

        val flowDecompress: Flow[ByteString, ByteString, NotUsed] =
            Flow[ByteString].via(
                Compression.gunzip()
            )

        val flowToString: Flow[ByteString, String, NotUsed] =
            Flow[ByteString].map(_.utf8String)

        val sink: Sink[String, Future[Done]] = Sink.foreach(println)

        val (killSwitch, graph) =
            source
                .via(flowDecompress)
                .via(flowToString)
                .viaMat(KillSwitches.single)(Keep.right)
                .toMat(sink)(Keep.both)
                .run()

        graph.onComplete {
            case Success(_) =>
                println("Stream App >> File Data Extractor >> Stream completed successfully")
                killSwitch.shutdown()
        }
    }
}

例外情况如下：

[ERROR] [02/26/2018 12:13:10.682] [default-akka.actor.default-dispatcher-2] [akka.dispatch.Dispatcher] Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
scala.MatchError: Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
    at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
    at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

我调试代码，以下是我的观察

对此line执行， readByte（）返回3，因此无法理解输入是.gz格式

我无法解压缩.gz文件的数据块，我想要解决这个问题的指导，如果可能的话请提供指示并纠正我，如果我错了。

版本：

Akka Version =＆＃34; 2.5.9＆＃34;

Akka Http Version =＆＃34; 10.0.11＆＃34;

Alpakka Version =＆＃34; 0.14＆＃34;

Answer 1

在docs中：Compression.gunzip()将创建Flow，该Flow解压缩gzip压缩的数据流。

您可以在Linux或MacOS上使用 file 实用程序来查看使用了哪种算法来压缩文件，在我的情况下，文件是使用zlib压缩的，因此Compression.inflate()就可以了。

另请参阅this question，以获取有关zlib，zip和gzip如何关联的很好解释。

从中间偏移

1 个答案: