Apache Flink:为文件名添加时间戳

时间:2017-12-24 09:36:06

标签: scala apache-flink stream-processing

我想在Flink中创建一个BucketingSink,它将所有文件写入同一文件夹,但是使用当前时间戳设置文件名而不是递增计数器。例如:

部分0-1514107452000.avro

部分0-1514021052000.avro

...

这是我的代码:

val sink = new BucketingSink[Tuple2[String, MyType]]("/tmp/flink/")
sink.setBucketer(new MyTypeBucketer(new SimpleDateFormat("yyyy-MM-dd--HH")))
sink.setInactiveBucketThreshold(120000) // this is 2 minutes
sink.setBatchSize(1024 * 1024 * 64) // this is 64 MB,
sink.setPendingSuffix(".avro")

val writer: AvroKeyValueSinkWriter[String, MyType] = new AvroKeyValueSinkWriter[String, MyType](parseAvroSinkProperties())
sink.setWriter(writer.duplicate())

def parseAvroSinkProperties(): util.Map[String, String] = {
var properties = new util.HashMap[String, String]()
val stringSchema = Schema.create(Type.STRING)
val myTypeSchema = myType.getClassSchema
val keySchema = stringSchema.toString
val valueSchema = myTypeSchema.toString
val compress = true
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema)
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, compress.toString)
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC)
properties
}

class MyTypeBucketer(dateFormatter: SimpleDateFormat) extends DateTimeBucketer[Tuple2[String, MyType]] {
override def getBucketPath(clock: Clock, basePath: Path, element: Tuple2[String, MyType]) = {
  new Path(s"$basePath/${element.f1.getMyStringProp}")
}

有人有任何想法吗? 谢谢

0 个答案:

没有答案