Question

我有这样的数据：

|  id  |  action  |
|   1  | increase |
|   2  | increase |
|   1  | increase |
|   1  | decrease |
|   3  | decrease |

我想得到结果

|  id  | increase | decrease |
|   1  |     2    |     1    |
|   2  |     1    |     0    |
|   3  |     0    |     1    |

我尝试了类似的事情，但这是错误的：

val result = data.groupBy($"id").withColumn("increase", data("action").where(" action == 'increase' ").count).withColumn("decrease", data("action").where(" decrease == 'view' ").count)

35: error: value withColumn is not a member of org.apache.spark.sql.GroupedData

Answer 1

您可以使用groupBy.pivot，并使用count作为聚合函数：

df.groupBy("id").pivot("action").agg(count($"action")).na.fill(0).show
+---+--------+--------+
| id|decrease|increase|
+---+--------+--------+
|  1|       1|       2|
|  3|       1|       0|
|  2|       0|       1|
+---+--------+--------+

Spark SQL：将条件计数结果放入新列

1 个答案: