如何在foreachRDD中使用foreach来激发流媒体?

时间:2014-10-01 09:25:36

标签: apache-spark

我想读取每个元素到foreachRDD并使用每个元组做一些事情。

设置火花工作记忆= 756米。

  def main(args: Array[String]) {
    val sc = new StreamingContext("....."))
    val dataSet = sc.textFileStreame($<HDFS_FILE_PATH>)

    dataSet.foreachRDD(rdd => {
         rdd.foreachPartition((iterator: Iterator[String]) => {
             println("1 : "+iterator.next())
         })
    })
}

    sc.start()
    sc.awaitTermination()

当源sbt编译并运行spark..it时,这样不起作用。 他们没有显示控制台。

14/10/01 18:22:50 INFO MemoryStore: ensureFreeSpace(171438) called with curMem=0, maxMem=1109498265
14/10/01 18:22:50 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 167.4 KB, free 1057.9 MB)
14/10/01 18:22:50 INFO FileInputFormat: Total input paths to process : 1
14/10/01 18:22:50 INFO JobScheduler: Added jobs for time 1412155370000 ms
14/10/01 18:22:50 INFO JobScheduler: Starting job streaming job 1412155370000 ms.0 from job set of time 1412155370000 ms
14/10/01 18:22:50 INFO SparkContext: Starting job: foreachPartition at SbclogCep.scala:54
14/10/01 18:22:50 INFO DAGScheduler: Got job 0 (foreachPartition at SbclogCep.scala:54) with 1 output partitions (allowLocal=false)
14/10/01 18:22:50 INFO DAGScheduler: Final stage: Stage 0(foreachPartition at SbclogCep.scala:54)
14/10/01 18:22:50 INFO DAGScheduler: Parents of final stage: List()
14/10/01 18:22:51 INFO DAGScheduler: Missing parents: List()
14/10/01 18:22:51 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at MappedDStream.scala:35), which has no missing parents
14/10/01 18:22:51 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[2] at map at MappedDStream.scala:35)
14/10/01 18:22:51 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/01 18:23:00 INFO FileInputDStream: Finding new files took 4 ms
14/10/01 18:23:00 INFO FileInputDStream: New files at time 1412155380000 ms:

14/10/01 18:23:00 INFO JobScheduler: Added jobs for time 1412155380000 ms
14/10/01 18:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/10/01 18:23:10 INFO FileInputDStream: Finding new files took 3 ms
14/10/01 18:23:10 INFO FileInputDStream: New files at time 1412155390000 ms:

14/10/01 18:23:10 INFO JobScheduler: Added jobs for time 1412155390000 ms
14/10/01 18:23:20 INFO FileInputDStream: Finding new files took 8 ms
14/10/01 18:23:20 INFO FileInputDStream: New files at time 1412155400000 ms:

14/10/01 18:23:20 INFO JobScheduler: Added jobs for time 1412155400000 ms
14/10/01 18:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/10/01 18:23:30 INFO FileInputDStream: Finding new files took 4 ms
14/10/01 18:23:30 INFO FileInputDStream: New files at time 1412155410000 ms:

14/10/01 18:23:30 INFO JobScheduler: Added jobs for time 1412155410000 ms
14/10/01 18:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/10/01 18:23:40 INFO FileInputDStream: Finding new files took 3 ms
14/10/01 18:23:40 INFO FileInputDStream: New files at time 1412155420000 ms:

14/10/01 18:23:40 INFO JobScheduler: Added jobs for time 1412155420000 ms
14/10/01 18:23:50 INFO FileInputDStream: Finding new files took 8 ms
14/10/01 18:23:50 INFO FileInputDStream: New files at time 1412155430000 ms:

14/10/01 18:23:50 INFO JobScheduler: Added jobs for time 1412155430000 ms
14/10/01 18:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/10/01 18:24:00 INFO FileInputDStream: Finding new files took 4 ms
14/10/01 18:24:00 INFO FileInputDStream: New files at time 1412155440000 ms:

14/10/01 18:24:00 INFO JobScheduler: Added jobs for time 1412155440000 ms
14/10/01 18:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/10/01 18:24:10 INFO FileInputDStream: Finding new files took 3 ms
14/10/01 18:24:10 INFO FileInputDStream: New files at time 1412155450000 ms:

14/10/01 18:24:10 INFO JobScheduler: Added jobs for time 1412155450000 ms

请帮帮我。 我想回家。

0 个答案:

没有答案