有一个RDD
用于计算字数:
scala> demo2.collect()
res16: Array[String] = Array(hadoop,impala,hive, spark,storm, spark,hive,hive)
如图所示,rdd中有3个hive
和2个spark
,但是当我应用-
操作时:
scala> demo2.flatMap(x => x.split(",")).map(x => (x,1)).reduceByKey((x,y) => (x-y)).collect()
res25: Array[(String, Int)] = Array((hive,1), (impala,1), (spark,0), (hadoop,1), (storm,1))
如图所示,(spark,0)
符合预期,但是1-1-1
应该等于-1
,那么为什么它返回(hive,1)
而不是(hive,-1)
?
感谢您的帮助。
此外,这是我创建demo2
的代码:
val demo2 = sc.textFile("hdfs://my_namenode:8022/tmp/demo_name.csv")
demo_name.csv
的内容:
[root@test-04 tmp]# hdfs dfs -text /tmp/demo_name.csv
hadoop,impala,hive
spark,storm
spark,hive,hive