我正在使用官方的flume + spark配置,如文档中所述,但是在注册到主机和端口号后,flume无法成功发送事件。另一方面,火花TID再也不会收到错过的消息。
以下是我的配置:
TwitterAgent1.sources = PublicStream2
TwitterAgent1.channels = fileCh2
TwitterAgent1.sinks = avrosink2
TwitterAgent1.sources.PublicStream2.type = com.cloudsigma.flume.twitter.TwitterSource
TwitterAgent1.sources.PublicStream2.channels = fileCh2
TwitterAgent1.sources.PublicStream2.consumerKey =
TwitterAgent1.sources.PublicStream2.consumerSecret =
TwitterAgent.sources.PublicStream2.accessToken =
TwitterAgent1.sources.PublicStream2.accessTokenSecret =
TwitterAgent1.sources.PublicStream2.keywords = some keywrds
#TwitterAgent1.sources.PublicStream2.locations = -,-
TwitterAgent1.sources.PublicStream2.language = en
TwitterAgent1.sources.PublicStream2.follow =,
TwitterAgent1.sinks.avrosink2.type = avro
TwitterAgent1.sinks.avrosink2.batch-size = 1
TwitterAgent1.sinks.avrosink2.hostname = 1x5.3x.3.1x2 --> IP of host as i am in cluster
TwitterAgent1.sinks.avrosink2.port = 9988
TwitterAgent1.sinks.avrosink2.channel = fileCh2
TwitterAgent1.channels.fileCh2.type = file
TwitterAgent1.channels.fileCh2.capacity = 10000
TwitterAgent1.channels.fileCh2.transactionCapacity = 10000
pyspark的代码:
try:
# create SparkContext on all CPUs available: in my case I have 4 CPUs on my laptop
conf = SparkConf().setAppName("tweeterAnalysis")
sc = ps.SparkContext(conf=conf)
sqlContext = SQLContext(sc)
print("Just created a SparkContext")
except ValueError:
warnings.warn("SparkContext already exists in this scope")
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 10)
flumeStream = FlumeUtils.createStream(ssc, "pa.pan.net", 41414)
ssc.start()
ssc.awaitTermination()
错误: 无法传递事件。例外如下。 org.apache.flume.EventDeliveryException:发送事件失败 在org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:389) 在org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) 在org.apache.flume.SinkRunner $ PollingRunner.run(SinkRunner.java:145) 在java.lang.Thread.run(Thread.java:748) 引起原因:org.apache.flume.EventDeliveryException:NettyAvroRpcClient {主机:pan0143.panoulu.net,端口:41414}:无法发送批处理 在org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:314) 在org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:373) ...还有3个
有人可以帮忙吗?