如何在Spark Scala数据框的枢轴列中具有值

时间:2019-11-17 03:12:14

标签: scala dataframe apache-spark pivot

我的数据框的值

+---+-----------------------+----------+---------+------------+---------+
|id |database_name          |users     |groups   |type        |isAllowed|
+---+-----------------------+----------+---------+------------+---------+
|73 |[ww_hr_dl_highsecure]  |[hive]    |[hrhs]   |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |update      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |create      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |drop        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |alter       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |index       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |lock        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |all         |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |read        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |write       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |repladmin   |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |serviceadmin|true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |select      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |update      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |create      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |drop        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |alter       |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |all         |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |read        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |write       |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|select      |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|read        |true     |
+---+-----------------------+----------+---------+------------+---------+

我想根据类型列旋转数据框。 这样我需要的结果数据帧就像

id db_name                 users    group select update create  drop  alter
73 ww_hr_dl_highsecure     hive     hrhs   true    null    null  null  null
73 ww_hr_dl_highsecure     svchrdat null   true    true    true  true  true
84 ww_core_dim_dl_tables   svc02001 true   true    true    true  true  true

我不知道如何将新列中的值作为原始数据框中的isAllowed列中的值进行透视。

我到目前为止所做的是

val dfs3 = dfs2.groupBy("database_name","users").pivot("type").expr("isAllowed")

1 个答案:

答案 0 :(得分:0)

是否可以将组添加到组中?

df.groupBy("database_name","users","groups").pivot("type").agg(first("isAllowed")).show(false)

输出:

+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|database_name          |users     |groups   |all |alter|create|drop|index|lock|read|repladmin|select|serviceadmin|update|write|
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|[ww_core_dim_dl_tables]|[svc02001]|[]       |true|true |true  |true|null |null|true|null     |true  |null        |true  |true |
|[ww_hr_dl_highsecure]  |[svchrdat]|[]       |true|true |true  |true|true |true|true|true     |true  |true        |true  |true |
|[ww_hr_dl_highsecure]  |[hive]    |[hrhe]   |null|null |null  |null|null |null|null|null     |true  |null        |null  |null |
|[ww_core_dim_dl_tables]|[]        |[walmart]|null|null |null  |null|null |null|true|null     |true  |null        |null  |null |
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+