流接收器到S3

时间:2018-08-14 15:45:08

标签: amazon-s3 apache-flink flink-streaming

我正在尝试为流输出创建一个s3接收器。我认为BucketingSink很好,因为它用于HDFS。但是似乎S3 url无法识别为hdfs。我收到以下错误:

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.RuntimeException: Error while creating FileSystem when initializing the state of the BucketingSink.
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 's3' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.

是否有一种方法可以使S3适用于BucketingSink,或者我可以使用除BucketingSink以外的其他选项?我目前正在运行1.5.2。很乐意提供其他信息。

谢谢!

编辑:

我的接收器创建/使用如下所示:

val s3Sink = new BucketingSink[String]("s3://s3bucket/sessions")
s3Sink.setBucketer(new DateTimeBucketer[String]("yyyy-MM-dd--HHmm"))
s3Sink.setWriter(new StringWriter[String]())
s3Sink.setBatchSize(200)
s3Sink.setPendingPrefix("sessions-")
s3Sink.setPendingSuffix(".csv")

// Create stream and do stuff here 
stream.addSink(s3Sink)

1 个答案:

答案 0 :(得分:2)

可能您必须在Flink作业中包含hadoop-aws jar。引用此链接会有所帮助:https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/deployment/aws.html#provide-s3-filesystem-dependency