我的数据框看起来像:
+-------------------+-------------+
| Nationality| continent|
+-------------------+-------------+
| Turkmenistan| Asia|
| Azerbaijan| Asia|
| Canada|North America|
| Luxembourg| Europe|
| Gambia| Africa|
我的输出应如下所示:
Map(Gibraltar -> Europe, Haiti -> North America)
因此,我正在尝试将数据帧转换为
scala.collection.mutable.Map[String, String]()
我正在尝试以下代码:
var encoder = Encoders.product[(String, String)]
val countryToContinent = scala.collection.mutable.Map[String, String]()
var mapped = nationalityDF.mapPartitions((it) => {
....
....
countryToContinent.toIterator
})(encoder).toDF("Nationality", "continent").as[(String, String)](encoder)
val map = mapped.rdd.groupByKey.collect.toMap
但是结果图具有以下输出:
Map(Gibraltar -> CompactBuffer(Europe), Haiti -> CompactBuffer(North America))
在没有CompactBuffer的情况下如何获取哈希图结果?
答案 0 :(得分:2)
让我们创建一些数据:
val df = Seq(
("Turkmenistan", "Asia"),
("Azerbaijan", "Asia"))
.toDF("Country", "Continent")
尝试先映射到元组,然后再收集到映射中:
df.map{ r => (r.getString(0), r.getString(1))}.collect.toMap
输出:
scala.collection.immutable.Map[String,String] = Map(Turkmenistan -> Asia, Azerbaijan -> Asia)