本地模式中 SPARK Streaming 1.6 中的以下代码不执行任何操作。 没有失败,旁边没有输出。
我的问题:在本地模式
中这是不可能的吗?代码的一部分
lines.foreachRDD ( rdd => {
val wordCounts = rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/output/")
}
)
停止工作时,我看到了:
org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode!
完整的代码,只是试一试这么简单:
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
val conf = new SparkConf().setMaster("local[4]").setAppName("FileWordCount")
conf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(conf, Seconds(5))
val lines = ssc.textFileStream("hdfs://quickstart.cloudera:8020/user/cloudera/spark_input/")
lines.foreachRDD ( rdd => {
val wordCounts = rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.saveAsTextFile("hdfs://localhost:8020/user/cloudera/output/")
}
)
ssc.start()
ssc.awaitTermination()
管理如下运行这个,为什么这个工作与上面不确定。但是,无论启动后使用的cp还是mv,它都会返回0个长度的文件。
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
val sparkConf = new SparkConf().setAppName("fileCopy").setMaster("local[4]")
sparkConf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(sparkConf, Seconds(30))
val stream = ssc.textFileStream("hdfs://quickstart.cloudera:8020/user/cloudera/spark_input/")
stream.foreachRDD(rdd =>rdd.coalesce(1,true).map(line => (line.split(" ")(1)+':'+line.split(" ")(2),1)).reduceByKey(_ + _).saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/output_"+System.currentTimeMillis(),classOf[org.apache.hadoop.io.compress.GzipCodec]))
ssc.start()
ssc.awaitTermination()
最后,这确实按预期工作......困惑
stream.foreachRDD ( rdd => rdd.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/outputX_"+System.currentTimeMillis()))