Spark SQL:将条件计数结果放入新列

时间:2017-04-28 03:25:53

标签: sql scala apache-spark apache-spark-sql

我有这样的数据:

|  id  |  action  |
|   1  | increase |
|   2  | increase |
|   1  | increase |
|   1  | decrease |
|   3  | decrease |

我想得到结果

|  id  | increase | decrease |
|   1  |     2    |     1    |
|   2  |     1    |     0    |
|   3  |     0    |     1    |

我尝试了类似的事情,但这是错误的:

val result = data.groupBy($"id").withColumn("increase", data("action").where(" action == 'increase' ").count).withColumn("decrease", data("action").where(" decrease == 'view' ").count)

35: error: value withColumn is not a member of org.apache.spark.sql.GroupedData

1 个答案:

答案 0 :(得分:4)

您可以使用groupBy.pivot,并使用count作为聚合函数:

df.groupBy("id").pivot("action").agg(count($"action")).na.fill(0).show
+---+--------+--------+
| id|decrease|increase|
+---+--------+--------+
|  1|       1|       2|
|  3|       1|       0|
|  2|       0|       1|
+---+--------+--------+