使用SparkSQL when函数选择列

时间:2018-08-17 16:00:22

标签: scala apache-spark select apache-spark-sql case-when

在SparkSQL文档中,有一个when函数返回一列。给出的示例如下:

people.select(when(people("gender") === "male", 0)
   .when(people("gender") === "female", 1)
   .otherwise(2))

在此示例中,when条件的结果为0、1或2。但是,如果我希望结果成为people DataFrame的列怎么办?例如,给定以下数据:

id | name    | gender | testosterone | estrogen
-----------------------------------------------
 1 | Joe     |   male |           10 |        2
 2 | Sue     | female |            3 |       12
 3 | John    |   male |            9 |        3
 4 | Kim     | female |            1 |       10

我想要这样的东西:

SELECT
    name,
    CASE WHEN gender = "male" THEN testosterone
         WHEN gender = "female" THEN estrogen
    END AS hormone_level
FROM
    people

结果将是:

name    | hormone_level
-----------------------
Joe     |            10
Sue     |            12
John    |             9
Kim     |            10

1 个答案:

答案 0 :(得分:3)

只是

when(people("gender") === "female", people("estrogen"))
  .when(people("gender") === "male", people("testosterone"))
  // .otherwise(???) Add base-case if required