我有这样的数据:
| id | action |
| 1 | increase |
| 2 | increase |
| 1 | increase |
| 1 | decrease |
| 3 | decrease |
我想得到结果
| id | increase | decrease |
| 1 | 2 | 1 |
| 2 | 1 | 0 |
| 3 | 0 | 1 |
我尝试了类似的事情,但这是错误的:
val result = data.groupBy($"id").withColumn("increase", data("action").where(" action == 'increase' ").count).withColumn("decrease", data("action").where(" decrease == 'view' ").count)
35: error: value withColumn is not a member of org.apache.spark.sql.GroupedData
答案 0 :(得分:4)
您可以使用groupBy.pivot
,并使用count
作为聚合函数:
df.groupBy("id").pivot("action").agg(count($"action")).na.fill(0).show
+---+--------+--------+
| id|decrease|increase|
+---+--------+--------+
| 1| 1| 2|
| 3| 1| 0|
| 2| 0| 1|
+---+--------+--------+