我有两个案例类和一个RDD。
case class Thing1(Id: String, a: String, b: String, c: java.util.Date, d: Double)
case class Thing2(Id: String, e: java.util.Date, f: Double)
val rdd1 = // Loads an rdd of type RDD[Thing1]
val rdd2 = // Loads an rdd of type RDD[Thing2]
我想创建2个新的RDD [Thing1],1包含rdd1的元素,其中元素在rdd2中存在Id,另一个包含rdd1的元素,其中元素在rdd2中不存在Id < / p>
这是我尝试过的(看过这个,Scala Spark contains vs. does not contain和其他堆栈溢出帖子,但都没有用过)
val rdd2_ids = rdd2.map(r => r.Id)
val rdd1_present = rdd1.filter{case r => rdd2 contains r.Id}
val rdd1_absent = rdd1.filter{case r => !(rdd2 contains r.Id)}
但这让我错误error: value contains is not a member of org.apache.spark.rdd.RDD[String]
我已经看到很多关于SO的问题,询问如何做我想做的事情,但没有一个对我有用。我经常收到value _____ is not a member of org.apache.spark.rdd.RDD[String]
错误。
为什么这些其他答案对我不起作用,我怎样才能实现我的目标?
答案 0 :(得分:0)
我创建了两个简单的RDD
private string AuthOrCharge(ARequest req, bool ur = false) { ... }
private string AuthOrCharge(CRequest req, bool ur = false) { ... }
private string AuthOrCharge(PACRequest req, bool ur = false) { ... }
private string AuthOrCharge(VRequest req, bool ur = false) { ... }
private string AuthOrCharge(BCRequest req, bool ur = false) { ... }
private string AuthOrCharge(BRRequest req, bool ur = false) { ... }
private string AuthOrCharge(BCURequest req, bool ur = false) { ... }
现在,您可以通过要在其中找到共同值的相应元素加入它们:
private string AuthOrCharge(object req, bool ur = false) {
throw new ArgumentException($"Unknown type: {req.GetType()}");
}
private string AuthOrChargeDispatch(dynamic req, bool ur = false) {
return AuthOrCharge(req, ur);
}
答案 1 :(得分:0)
尝试完全外连接 -
val joined = rdd1.map(s=>(s.id,s)).fullOuterJoin(rdd2.map(s=>(s.id,s))).cache()
//only in left
joined.filter(s=> s._2._2.isEmpty).foreach(println)
//only in right
joined.filter(s=>s._2._1.isEmpty).foreach(println)
//in both
joined.filter(s=> !s._2._1.isEmpty && !s._2._2.isEmpty).foreach(println)