我有以下数据框:
+------------------------------------+------------------------------+
|MeteVarID |Conc |
+------------------------------------+------------------------------+
|9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d|Friday 0 0.9604490986400536 |
|9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d|Friday 1 0.8109076852795446 |
|9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d|Friday 2 0.7282039568471731 |
|9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d|Friday 3 0.5335418350493728 |
我想按MeteVarID
进行分组并连接字符串。最终的数据框应该是:
9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d | Friday 0 0.9604490986400536, Friday 1 0.8109076852795446, etc.
答案 0 :(得分:-1)
你可以使用普通的' RDD API并切换回数据帧。
df.rdd
.map( c=> (c.getAs[String]("MeteVarID") , c.getAs[String]("Conc") ) )
.reduceByKey( _ +", "+ _)
.toDF("MeteVarID", "Conc")
.show(false)
+------------------------------------+------------------------------------------------------------------------------------------------------------------+
|MeteVarID |Conc |
+------------------------------------+------------------------------------------------------------------------------------------------------------------+
|9d71445e-ee5d-4d37-bfb7-02f6e6eacd9d|Friday 0 0.9604490986400536, Friday 1 0.8109076852795446, Friday 2 0.7282039568471731, Friday 3 0.5335418350493728|
+------------------------------------+------------------------------------------------------------------------------------------------------------------+