这是我的情况(Spark1.2):
Iterable<<"PersonId", Tuple(List<Cat>, List<Dog>))>
{
{
1,{{1,cat1},{1,cat2}},{{1,dog1},{1,dog2}}
},
{
3,{{3,cat5}},{{3,dog5}}
}
}
来自cat(personId,catName)的结果:
{{1,cat1},{1,cat2},{2,cat3},{2,cat4},{3,cat5}}
狗的结果(personId,dogName):
{{1,dog1},{1,dog2},{2,dog3},{2,dog4},{3,dog5}}
来自clientwithkids(personId)的结果:
{1,3}
如何只保留有狗和猫的人。
这是我到目前为止所做的:
JavaPairRDD<PartitionKey, Iterable<CassandraRow>> catRDD =
functions.cassandraTable(keyspace, "cat")
.groupBy(new Function<CassandraRow, PartitionKey>() {
@Override
public PartitionKey call(CassandraRow row) throws Exception {
PartitionKey partitionKey =
new PartitionKey(row.getString("personId");
return partitionKey;
}
}
);
--> [{1,{1,cat1},{1,cat2}},{2,{2,cat3},{2,cat4}},{3,{3,cat5}}]
JavaPairRDD<PartitionKey, Iterable<CassandraRow>> dogRDD =
functions.cassandraTable(keyspace, "dog")
.groupBy(new Function<CassandraRow, PartitionKey>() {
@Override
public PartitionKey call(CassandraRow row) throws Exception {
PartitionKey partitionKey =
new PartitionKey(row.getString("personId");
return partitionKey;
}
}
);
--> [{1,{1,dog1},{1,dog2}},{2,{2,dog3},{2,dog4}},{3,{3,dog5}}]
仅适用于&#34; personId&#34;出现在这个密钥空间中:
JavaPairRDD<PartitionKey, Iterable<CassandraRow>> keysRDD =
functions.cassandraTable(keyspace, "clientwithkids")
.groupBy(new Function<CassandraRow, PartitionKey>() {
@Override
public PartitionKey call(CassandraRow row) throws Exception {
PartitionKey partitionKey =
new PartitionKey(row.getString("bp"), row.getString("personId"),
return partitionKey;
}
});
--> [{1},{3}]
什么是有效的方法呢?
感谢您的帮助!