我有几个类型的RDD:RDD [(String,Int)]。我想根据键减去整数值。
这是一个例子:如果输入的RDD是
Valid_ record = (TcustomerTDL_2016266,16)
deleted_record = (TcustomerTDL_2016266,8)
由于键值相同,因此必须减去整数值。我尝试使用“SubtractByKey”但它似乎不起作用。所以预期的结果是(TcustomerTDL_2016266,8),这是16-8 = 8.`
我使用了以下代码:
val changes_total = valid_record.subtractByKey(deleted_record).
如果有其他方法可以做到这一点,或者这是不正确的,请告诉我。
以下是代码:
val Conf = new SparkConf().setAppName("Module").setMaster("local")
val sc = new SparkContext(Conf)
val incoming_file =sc.wholeTextFiles("D:/Users/Documents/siva_hourly") //changed code
val output = incoming_file.map{case(k,v) => (k.split("/")(6),v.split("\\r?\\n"))}
output.cache()
val change_type = output.map{case (k,v) => (k,(v.toList.map( x => x.split("\001")(2))))} //changed code
val change_delete_count = change_type.map{case(k,v) => (k,(v.filter{ x => x == "D" }).length)}
val change_record_foreach4 = change_delete_count.map{case(k,v) => (k.split("_"),v)}
val change_record_foreach3 = change_record_foreach4.map{case(k,v)=>(k(0)+'_'+k(1),v)}
val change_valid_count = change_type.map{case(k,v) => (k,(v.filter{ x => x =="A" || x == "I"}).length)}
val change_record_foreach = change_valid_count.map{case(k,v) => (k.split("_"),v)}
val change_record_foreach1 = change_record_foreach.map{case(k,v)=>(k(0)+'_'+k(1),v)}
val valid_record = change_record_foreach1.reduceByKey((x, y) => x + y)
val deleted_record = change_record_foreach3.reduceByKey((x, y) => x + y)
val changes_total = valid_record.subtractByKey(deleted_record)
答案 0 :(得分:5)
这不是subtractByKey
的正确用法以下是subtractByKey如何工作的示例
假设您有两个RDD,如下所示。
two pair RDDs (rdd = {(1, 2), (3, 4), (3, 6)} other = {(3, 9)})
rdd.subtractByKey(other)
结果如下
{(1, 2)}
你可以这样做
val joinRDD = Valid_ record .join(deleted_record)
val resultRDD = joinRDD.mapValues(x => x._1 - x._2)