SPARK Streaming foreachRDD在本地模式下不执行任何操作

时间:2016-11-21 11:16:22

标签: spark-streaming

本地模式 SPARK Streaming 1.6 中的以下代码不执行任何操作。 没有失败,旁边没有输出。

我的问题:在本地模式

中这是不可能的吗?

代码的一部分

 lines.foreachRDD ( rdd => {
                            val wordCounts = rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _)
                            wordCounts.saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/output/")
                           }
                  )

停止工作时,我看到了:

org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode!

完整的代码,只是试一试这么简单:

import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._

val conf = new SparkConf().setMaster("local[4]").setAppName("FileWordCount")
conf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(conf, Seconds(5))

val lines =    ssc.textFileStream("hdfs://quickstart.cloudera:8020/user/cloudera/spark_input/")

lines.foreachRDD ( rdd => {
                       val wordCounts = rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _) 
wordCounts.saveAsTextFile("hdfs://localhost:8020/user/cloudera/output/")                              
                          }
                 )

ssc.start()
ssc.awaitTermination()

管理如下运行这个,为什么这个工作与上面不确定。但是,无论启动后使用的cp还是mv,它都会返回0个长度的文件。

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

val sparkConf = new SparkConf().setAppName("fileCopy").setMaster("local[4]")
sparkConf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(sparkConf, Seconds(30))
val stream = ssc.textFileStream("hdfs://quickstart.cloudera:8020/user/cloudera/spark_input/")

stream.foreachRDD(rdd =>rdd.coalesce(1,true).map(line => (line.split(" ")(1)+':'+line.split(" ")(2),1)).reduceByKey(_ + _).saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/output_"+System.currentTimeMillis(),classOf[org.apache.hadoop.io.compress.GzipCodec])) 
ssc.start()
ssc.awaitTermination()

最后,这确实按预期工作......困惑

stream.foreachRDD ( rdd => rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/outputX_"+System.currentTimeMillis()))

0 个答案:

没有答案