火花:reduceByKey((x,y)=>(x-y))的结果

时间:2018-12-18 08:34:26

标签: scala apache-spark

有一个RDD用于计算字数:

scala> demo2.collect()
res16: Array[String] = Array(hadoop,impala,hive, spark,storm, spark,hive,hive) 

如图所示,rdd中有3个hive和2个spark,但是当我应用-操作时:

scala> demo2.flatMap(x => x.split(",")).map(x => (x,1)).reduceByKey((x,y) => (x-y)).collect()
res25: Array[(String, Int)] = Array((hive,1), (impala,1), (spark,0), (hadoop,1), (storm,1))

如图所示,(spark,0)符合预期,但是1-1-1应该等于-1,那么为什么它返回(hive,1)而不是(hive,-1)

感谢您的帮助。

屏幕截图: enter image description here


此外,这是我创建demo2的代码:

val demo2 = sc.textFile("hdfs://my_namenode:8022/tmp/demo_name.csv")

demo_name.csv的内容:

[root@test-04 tmp]# hdfs dfs -text /tmp/demo_name.csv
hadoop,impala,hive
spark,storm
spark,hive,hive

0 个答案:

没有答案