如果输入格式为
(x1,(a,b,c,List(key1, key2))
(x2,(a,b,c,List(key3))
我希望实现此输出
(key1,(a,b,c,x1))
(key2,(a,b,c,x1))
(key3,(a,b,c,x2))
以下是代码:
var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4))))
var hashtags_keys = hashtags.map(x => if(x._2._4.size == 0) (x._1, (x._2._1, x._2._2, x._2._3, 0)) else
x._2._4.map(y => (y, (x._2._1, x._2._2, x._2._3, 1))))
函数getHashTags()返回一个列表。如果列表不为空,我们希望使用列表中的每个元素作为新键。我该如何解决这个问题?
答案 0 :(得分:1)
将rdd
创建为:
val rdd = sc.parallelize(
Seq(
("x1",("a","b","c",List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
)
)
您可以像这样使用flatMap
:
rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] =
// Array((key1,(a,b,c,x1)),
// (key2,(a,b,c,x1)),
// (key3,(a,b,c,x2)))
答案 1 :(得分:1)
这是一种方法:
val rdd = sc.parallelize(Seq(
("x1", ("a", "b", "c", List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
))
val rdd2 = rdd.flatMap{
case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}
rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))