假设这些是我的CSV文件:
attr1;attr2
11111;MOC
22222;MTC
11111;MOC
22222;MOC
33333;MMS
当attr2 = MOC时,我希望在第一列中出现次数。像这样:
(11111,2)
(22222,1)
我试过了:
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val data = textFile.map(line => line.split(";").map(elem => elem.trim))
val header = new SimpleCSVHeader(data.take(1)(0))
val rows = data.filter(line => header(line,"attr1") != "attr1")
val attr1 = rows.map(row => header(row,"attr1"))
val attr2 = rows.map(row => header(row,"attr2"))
attr1.map( k => (k,1) ).reduceByKey(_+_)
attr1.foreach (println)
如何在我的代码中添加条件? 我的代码的结果是:
(11111,2)
(22222,2)
(33333,1)
答案 0 :(得分:0)
使用过滤器(再次):
val rows = data
.filter(line => header(line,"attr1") != "attr1")
.filter(line => header(line,"attr2") == "MOC")
然后像以前一样继续......