我正在使用Akka流式传输来自 AWS S3存储桶的 .gz 文件,该应用程序会读取数据块并对其进行处理。
当我从头开始读取文件时,以下代码工作正常,但如果我从特定偏移量启动文件,则解压缩失败并抛出ParsingException。
代码:
object StreamApp extends App {
// Run 1
//startStreaming(0)
// Run 2
startStreaming(33339)
def startStreaming(pointer: Long): Unit = {
println("Stream App Starting")
implicit val system = ActorSystem()
val materializerSettings = ActorMaterializerSettings(system)
implicit val materializer = ActorMaterializer(materializerSettings)
implicit val dispatcher: ExecutionContextExecutor = system.dispatcher
implicit val timeout: Timeout = Timeout(1 seconds)
val accessKey = "aws-access-key"
val secretAccessKey = "aws-secret-access-key"
val awsCredentials = new AWSStaticCredentialsProvider(
new BasicAWSCredentials(accessKey, secretAccessKey)
)
val s3Region: String = "s3-region-name"
val s3Bucket: String = "s3-bucket-name"
val s3DataFile: String = "s3-object-path.gz"
val settings = new S3Settings(MemoryBufferType, None, awsCredentials, s3Region, false)
val s3Client = new S3Client(settings)(system, materializer)
val currentOffset: Long = pointer
val source: Source[ByteString, NotUsed] =
s3Client
.download(s3Bucket, s3DataFile, ByteRange.fromOffset(currentOffset))
val flowDecompress: Flow[ByteString, ByteString, NotUsed] =
Flow[ByteString].via(
Compression.gunzip()
)
val flowToString: Flow[ByteString, String, NotUsed] =
Flow[ByteString].map(_.utf8String)
val sink: Sink[String, Future[Done]] = Sink.foreach(println)
val (killSwitch, graph) =
source
.via(flowDecompress)
.via(flowToString)
.viaMat(KillSwitches.single)(Keep.right)
.toMat(sink)(Keep.both)
.run()
graph.onComplete {
case Success(_) =>
println("Stream App >> File Data Extractor >> Stream completed successfully")
killSwitch.shutdown()
}
}
}
例外情况如下:
[ERROR] [02/26/2018 12:13:10.682] [default-akka.actor.default-dispatcher-2] [akka.dispatch.Dispatcher] Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
scala.MatchError: Failure(akka.stream.impl.io.ByteStringParser$ParsingException: Parsing failed in step ReadHeaders) (of class scala.util.Failure)
at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
at StreamApp$$anonfun$startStreaming$1.apply(StreamApp.scala:71)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
我调试代码,以下是我的观察
对此line执行, readByte()返回3,因此无法理解输入是.gz格式
我无法解压缩.gz文件的数据块,我想要解决这个问题的指导,如果可能的话请提供指示并纠正我,如果我错了。
版本:
Akka Version =" 2.5.9"
Akka Http Version =" 10.0.11"
Alpakka Version =" 0.14"
答案 0 :(得分:0)
在docs中:Compression.gunzip()
将创建Flow,该Flow解压缩gzip压缩的数据流。
您可以在Linux或MacOS上使用 file 实用程序来查看使用了哪种算法来压缩文件,在我的情况下,文件是使用zlib压缩的,因此Compression.inflate()
就可以了。
另请参阅this question,以获取有关zlib,zip和gzip如何关联的很好解释。