NetcatSource:发送的客户端超过最大长度

时间:2016-03-30 08:17:19

标签: json hadoop apache-spark netcat flume-ng

各位大家好,并提前感谢您花时间阅读本文:) 我试图在我的Hadoop集群中发送JSON对象以使用Spark处理它,这个JSON大约是15KB。我用这种方式设置了我的水槽代理:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 41400
a1.sources.r1.max-line-length = 512000
a1.sources.r1.eventSize = 512000
#a1.sources.deserializer.maxLineLength = 512000

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /hadoop/hdfs/data
a1.sinks.k1.hdfs.filePrefix = CDR
a1.sinks.k1.hdfs.callTimeout = 15000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 226
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.batchSize = 226

# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.capacity = 512000
a1.channels.c1.transactionCapacity =512000 

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

除此之外,我有一个perl的脚本,它通过指定端口的套接字发送JSON对象,但是当我启动flume代理时,我得到了这样的消息:

 WARN source.NetcatSource: Client sent event exceeding the maximum length

我不明白的是,我将事件的最大行长度设置为512000字节,大于15 KB,是否有人可以帮助我? 谢谢,抱歉我的英文不好

1 个答案:

答案 0 :(得分:0)

您可以验证您的json(在您的perl脚本上)以换行符(EOL)结尾。

比照。文档:https://flume.apache.org/FlumeUserGuide.html#netcat-source