Apache spark-streaming应用程序输出未转发给master

时间:2014-07-03 19:33:05

标签: streaming apache-spark flume

我正在尝试运行以下的FlumeEvent示例

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.streaming.flume.FlumeUtils

object FlumeEventCount {
def main(args: Array[String]) {


val batchInterval = Milliseconds(2000)

// Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumeEventCount")
.set("spark.cleaner.ttl","3")


val ssc = new StreamingContext(sparkConf, batchInterval)


// Create a flume stream
var  stream = FlumeUtils.createStream(ssc, "192.168.1.5",3564, StorageLevel.MEMORY_ONLY_SER_2)


// Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received  flume events." + cnt ).print()
stream.count.print()
stream.print()
ssc.start()
ssc.awaitTermination()
}
}

我的sbt文件如下

import AssemblyKeys._

assemblySettings

name := "flume-test"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming-flume" % "1.0.0" exclude("org.apache.spark","spark-core") exclude("org.apache.spark", "spark-streaming_2.10")

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

我使用以下命令运行程序

/tmp/spark-1.0.0-bin-hadoop2/bin/spark-submit --class FlumeEventCount --master local --deploy-mode client /tmp/fooproj/target/scala-2.10/cert-log-manager-assembly-1.0.jar 

另一方面,水槽应用程序正在正确发送所有内容,我可以在日志中看到它已收到。

我没有对spark的配置进行任何更改,也没有设置任何环境变量,我只是下载并解压缩程序。

有人能告诉我我做错了什么吗?

//编辑:当我执行spark的FlumeEventCount示例时,它可以工作 // edit2:如果我删除了awaiTermination并添加了一个ssc.stop它会一次打印所有内容,我想这会发生因为某些东西被刷新了

1 个答案:

答案 0 :(得分:5)

....我现在应该更加小心地学会rtfm,

引用此页面:https://spark.apache.org/docs/latest/streaming-programming-guide.html

  

// Spark Streaming至少需要两个工作线程   val ssc = new StreamingContext(“local [2]”,“NetworkWordCount”,Seconds(1))

我一直只用一个线程发射火花 以下工作正常

stream.map(event=>"Event: header:"+ event.event.get(0).toString+" body:"+ new String(event.event.getBody.array) ).print