Question

我搜索了这一点，但找不到任何我能适应我的情况的东西。我有一个像这样的数据框：

+-----------------+---------------+
|             keys|         values|
+-----------------+---------------+
|[one, two, three]|[101, 202, 303]|
+-----------------+---------------+

键有一个字符串数组，值有一个整数数组。

我想创建一个新列，其中包含值的键映射，如下所示：

+-----------------+---------------+---------------------------+
|             keys|         values|                        map|
+-----------------+---------------+---------------------------+
|[one, two, three]|[101, 202, 303]|Map(one->101, two->202, etc|
+-----------------+---------------+---------------------------+

我一直在看这个问题，但不确定它是否可以作为我的情况的起点：Spark DataFrame columns transform to Map type and List of Map Type

我在Scala中需要这个。

谢谢！

Answer 1

你可以在链接问题中创建一个类似的udf：

 val toMap = udf((keys: Seq[String], values: Seq[Int]) => {
    keys.zip(values).toMap
  })

并将其用作：

df.withColumn("map", toMap($"keys", $"values"))

Answer 2

从Spark 2.4开始，有一个内置版本 def map_from_arrays(keys: Column, values: Column): Column 在org.apache.spark.sql.functions

中

从其他列创建Apache Spark中的映射列

2 个答案: