我在Dataframe下面,我想仅使用RDD展平。任何人都可以帮忙吗?
输入数据框:
+---------+-------------+-----------------+-----+----------------+------------------------------------------------------+ |TPNB |unitOfMeasure|locationReference|types|types |effectiveDateTime | +---------+-------------+-----------------+-----+----------------+------------------------------------------------------+ |079562193|EA |0810 |STORE|[SELLABLE, HELD]|[2015-10-09T00:55:23.6345Z, 2015-10-09T00:55:23.6345Z]| +---------+-------------+-----------------+-----+----------------+------------------------------------------------------+
输出:
TPNB unitOfMeasure locationReference types types effectiveDateTime 079562193 EA 0810 STORE SELLABLE 2015-10-09T00:55:23.6345Z 079562193 EA 0810 STORE HELD 2015-10-09T00:55:23.6345Z
我正在尝试这样的事情,这似乎并没有起作用。
final_output.map(value=>((value(0),value(1),value(2),value(3)),value(5),value(6) )).map{ case(key,value)=>value.map(records=>(key,records)) }
答案 0 :(得分:1)
要使用仅RDD功能进行转换,您可以在将数据帧转换为RDD后执行类似以下操作(例如,通过df.rdd
):
val rdd = sc.parallelize(Seq(
("079562193", "EA", "0810", "STORE", List("SELLABLE", "HELD"), List("2015-10-09T00:55:23.6345Z", "2015-10-09T00:55:23.6345Z"))
)).
map{ case (t, u, l, y, ts, ds) => ((t, u, l, y), (ts, ds)) }.
flatMapValues{ case (x, y) => x zip y }.
map{ case ((t, u, l, y), (ts, ds)) => Seq(t, u, l, y, ts, ds) }
rdd.collect.foreach(println)
List(079562193, EA, 0810, STORE, SELLABLE, 2015-10-09T00:55:23.6345Z)
List(079562193, EA, 0810, STORE, HELD, 2015-10-09T00:55:23.6345Z)
答案 1 :(得分:1)
这是您在RDD上寻找的内容。将第5行和第6行转换为Map,并为每个行创建一行。
import spark.implicits._
val data = spark.sparkContext
.parallelize(
Seq(
("079562193",
"EA",
"0810",
"STORE",
Array("SELLABLE", "HELD"),
Array("2015-10-09T00:55:23.6345Z", "2015-10-09T00:55:23.6345Z"))
))
val result = data
.map(row => (row._1, row._2, row._3, row._4, (row._5.zip(row._6).toMap)))
.map(r => {
r._5.map(v => (r._1, r._2, r._3, r._4, v._1, v._2))
})
.collect()
.foreach(println)
((079562193,EA,0810,STORE,SELLABLE,2015-10-09T00:55:23.6345Z)
(079562193,EA,0810,STORE,HELD,2015-10-09T00:55:23.6345Z))