Question

下面提到的是我的水槽配置。

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = http
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler
a1.sources.r1.handler.nickname = random props

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs://10.0.40.18:9160/flume-test
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

水槽日志文件中没有错误，但是当使用hadoop命令有问题读取文件时。

hadoop fs -cat hdfs://10.0.40.18:9160/flume-test/even1393415633931

flume log message is hdfs file created is "hdfs://10.0.40.18:9160/flume-test/even1393415633931"

任何有用的帮助。

Answer 1

首先，尝试使用记录器替换HDFS接收器，以查看输入是否正确到达。

确认之后，我建议尝试调整接收器的刷新设置。 HDFS接收器在通过hdfs.batchSize刷新到HDFS之前批处理事件，默认情况下为100。这可能是个问题，因为您需要在输出第一次刷新之前发送100个JSON帖子。

最后，您可能还想尝试调整hdfs.writeFormat，默认设置为Writable而不是Text。

Answer 2

听起来你想要一个文本文件，所以你应该像这样使用DataStream：

a1.sinks.k1.hdfs.file.Type = DataStream

flume将数据发布到HDFS但是字符问题

2 个答案: