我们可以为HDFS Sink添加分隔符吗? 何时写入文件,我们如何添加记录分隔符?
以下是配置: -
tier1.sinks.hdfssink.type = hdfs
tier1.sinks.hdfssink.channel = memory
tier1.sinks.hdfssink.hdfs.path=tmp/kafka/%{topic}/%y-%m-%d
tier1.sinks.hdfssink.hdfs.rollSize=268435456
tier1.sinks.hdfssink.hdfs.rollCount=0
tier1.sinks.hdfssink.hdfs.rollInterval = 0
tier1.sinks.hdfssink.hdfs.useLocalTimeStamp=true
tier1.sinks.hdfssink.hdfs.fileType=DataStream
tier1.sinks.hdfssink.hdfs.inUseSuffix=.tmp
tier1.sinks.hdfssink.hdfs.batchSize=10000
答案 0 :(得分:0)
我倾向于使用Flume EventSerializer,其配置与此类似:
tier1.sinks.hdfssink.serializer = <your serialization class>
tier1.sinks.hdfssink.serializer.delimiter = < your delimiter>
您可以参考以下github网站获取详细信息和代码段。
https://github.com/relistan/flume-serializers
希望这有帮助!