1) for Categories
twitter handle , categories , sub_categories
handle , Products , MakeUp
handle , Health, MakeUp
handle2 , Services , Face
handle3 , Marketing , Soap
JavaPairRDD<String ,Category> categoryPairRDD
2) For Twitter
Twitter handle , twitter_post , twitter_likes
handle "Iphone" , 10
handle2 "Samsung" ,20
JavaPairRDD<String ,Twitter> twitterPairRDD
JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
.cogroup(twitterPairRDD);
我应该如何迭代cogroup值,以便在找到对象的情况下为If键打印值,否则 打印空值
即。在我的categoryPairRDD handle3存在,但它在twitterRDD中缺席所以输出密钥handle3应该是
handle3 , Marketing , Soap , null , null
最终出局应该是
handle , Products , Makeup , Iphone , 10
handle , Health , Makeup , , Iphone, 10
handle2 , Services , Face , Samsung , 20
handle3 , Marketing, Soap , null , null
答案 0 :(得分:1)
管理以获得解决方案
JavaPairRDD<String, Tuple2<Ontologies, Optional<twitterPairRDD>>> left = ontologiesPair.leftOuterJoin(twitterPairRDD);
left.foreach(new VoidFunction<Tuple2<String,Tuple2<Ontologies,Optional<Twitter>>>>() {
@Override
public void call(Tuple2<String, Tuple2<Ontologies, Optional<Instagram>>> arg0) throws Exception {
try{
Optional<Twitter> tweet = arg0._2._2();
//print values from tuple ie arg0._2._1() and tweet object
}
catch(Exception e){
Twitter tweet = new Twitter("",-1);
//Print values from arg0._2._1() and empty tweet object
}
但我仍然想知道使用联合组织的任何答案