我正在使用spark,做一些教程,假设我们在mycode路径中并在这里输入'spark-shell'。最终目录如下:
/mycode
|
|-input.txt
|-output
|--part-0000
|--part-0001
然后我按照tutorial所说的那样输入一些命令。
scala> val inputfile = sc.textFile("input.txt")
inputfile: org.apache.spark.rdd.RDD[String] = input.txt MapPartitionsRDD[14] at textFile at <console>:24
scala> val counts = inputfile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[17] at reduceByKey at <console>:26
scala> counts.collect().foreach(println)
(talk.,1)
(are,2)
(only,1)
(as,8)
(,1)
(they,7)
(love,,1)
(not,1)
(people,1)
(share.,1)
(or,1)
(care,1)
(beautiful,2)
(walk,1)
(look,,1)
scala> counts.saveAsTextFile("file:///home/hadoop/Mycode/output")
奇怪的是,当我尝试 cat 或文件时,此部分0000,没有此类文件或错误出现。如果主机的二进制版本中的文件不同,则由于this link,它不会出错。我非常怀疑这是由文件系统的误操作或我的hadoop或spark的错误配置引起的。谁能帮助我?谢谢:))