如何将嵌套列添加到DataFrame

时间:2018-03-01 09:49:34

标签: scala apache-spark apache-spark-sql

我的数据框df包含以下架构:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)

我想要做的是添加一个嵌套列,比如我的car_brand结构person。我该怎么做?

预期的最终架构如下所示:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- car_brand: string (nullable = true)

3 个答案:

答案 0 :(得分:3)

您可以解压缩结构并将其添加到新结构中,包括同时添加新列。例如,添加" bmw"对数据框中的所有人都这样做:

df.withColumn("person", struct($"person.*", lit("bmw").as("car_brand")))

答案 1 :(得分:1)

在人员中添加新的嵌套列:

df = df.withColumn(
        "person",
        struct(
            $"person.*",
            struct(
                lit("value_1").as("person_field_1"),
                lit("value_2").as("person_field_2"),
            ).as("nested_column_within_person")
       )
    )

最终模式:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- nested_column_within_person: struct (nullable = true)
 |    |    |-- person_field_1: string (nullable = true)
 |    |    |-- person_field_2: string (nullable = true)

答案 2 :(得分:0)

" Enable true color
if exists('+termguicolors')
  let &t_8f = "\<Esc>[38;2;%lu;%lu;%lum"
  let &t_8b = "\<Esc>[48;2;%lu;%lu;%lum"
  set termguicolors
endif