Question

我有以下Spark数据框架构

root
 |-- UserId: long (nullable = true)
 |-- VisitedCountry: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Name: string (nullable = false
 |    |    |-- Id: long (nullable = false)

我想将每个VisitedCountry转换为新数据框中的单独行

root
 |-- UserId: long (nullable = true)
 |-- CountryName: string (nullable = false)
 |-- CountryId: long (nullable = false)

Answer 1

在Scala上展开并选择：

df.withColumn("exploded", explode($"VisitedCountry"))
  .select($"UserId",
    $"exploded.Name".alias("CountryName"),
    $"exploded.ID".alias("CountryId")
  )

Answer 2

您可能想使用爆炸功能。

签出https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode

我不确定它将如何与结构一起使用。

为Spark DataFrame数组类型创建单独的行

2 个答案: