我创建了一个简单的火花流应用程序,使用基于拉的方法来使用Flume的数据。
Spark version: 2.2.0
Flume version: 1.7.0
当我在Eclipse中运行我的PC程序(Run As - Scala Application)时,它运行良好。但是在将它编译成jar并通过spark-submit提交应用程序之后,它没有从Flume接收任何数据。这是我的代码:
def main(args: Array[String]){
val conf = new SparkConf().setAppName("twitter").set("spark.streaming.stopGracefullyOnShutdown", "true")
val ssc = new StreamingContext(conf, Seconds(30))
val flumeStream = FlumeUtils.createPollingStream(ssc, "172.31.190.31", 9999)
val tweets = flumeStream.map(e => new String(e.event.getBody.array()))
tweets.print()
tweets.foreachRDD(rdd=>{
rdd.saveAsTextFile("/warehouse/raw/twitter/data")
})
ssc.start()
ssc.awaitTermination()
}
我通过右键单击项目构建程序 - 运行方式 - Maven构建 - 目标=包 - 运行。
以下是我提交应用的方式:
spark-submit --master local[*] --deploy-mode client --class co.id.linknet.general.StreamingFlume ./spark/lib/linknet-general-1.0.1.jar
Flume config:
TwitterAgent01.sources = Twitter
TwitterAgent01.channels = MemoryChannel01
TwitterAgent01.sinks = HDFS
TwitterAgent01.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent01.sources.Twitter.channels = MemoryChannel01
TwitterAgent01.sources.Twitter.consumerKey = xxx
TwitterAgent01.sources.Twitter.consumerSecret = xxx
TwitterAgent01.sources.Twitter.accessToken = xxx
TwitterAgent01.sources.Twitter.accessTokenSecret = xxx
TwitterAgent01.sources.Twitter.keywords = keyword1, keyword2, keywordN
TwitterAgent01.sinks = sparkStream
TwitterAgent01.sinks.sparkStream.type = org.apache.spark.streaming.flume.sink.SparkSink
TwitterAgent01.sinks.sparkStream.hostname = edge01
TwitterAgent01.sinks.sparkStream.port = 9999
TwitterAgent01.sinks.sparkStream.channel = MemoryChannel01
TwitterAgent01.channels.MemoryChannel01.type = memory
TwitterAgent01.channels.MemoryChannel01.capacity = 10000
TwitterAgent01.channels.MemoryChannel01.transactionCapacity = 10000
Flume和spark提交在同一台服务器上,我可以从自己telnet端口9999。
有关其他信息,我在flume和spark目录中添加了一些必需的库 $ FLUME_HOME / lib中
spark-streaming-flume_2.11-2.2.0.jar
spark-streaming-flume-sink_2.11-2.2.0.jar
scala-library-2.11.8.jar
commons-lang3-3.5.jar
$ SPARK_HOME /罐
spark-streaming-flume_2.11-2.2.0.jar
spark-streaming-flume-sink_2.11-2.2.0.jar
scala-library-2.11.8.jar
commons-lang-2.5.jar
commons-lang3-3.5.jar
我错过了什么吗?