我有一个像下面这样的表数据,我想用聚合来透视数据。
ColumnA ColumnB ColumnC
1 complete Yes
1 complete Yes
2 In progress No
2 In progress No
3 Not yet started initiate
3 Not yet started initiate
想像下面一样
ColumnA Complete In progress Not yet started
1 2 0 0
2 0 2 0
3 0 0 2
无论如何,我们可以在蜂巢或Impala中实现这一目标吗?
答案 0 :(得分:2)
将case
与sum
聚合一起使用:
select ColumnA,
sum(case when ColumnB='complete' then 1 else 0 end) as Complete,
sum(case when ColumnB='In progress' then 1 else 0 end) as In_progress,
sum(case when ColumnB='Not yet started' then 1 else 0 end) as Not_yet_started
from table
group by ColumnA
order by ColumnA --remove if order is not necessary
;
答案 1 :(得分:0)
这是在spark scala中执行此操作的方式。
val conf = spark.sparkContext.hadoopConfiguration
val test = spark.sparkContext.parallelize(List( ("1", "Complete", "yes"),
("1", "Complete", "yes"),
("2", "Inprogress", "no"),
("2", "Inprogress", "no"),
("3", "Not yet started", "initiate"),
("3", "Not yet started", "initiate"))
).toDF("ColumnA","ColumnB","ColumnC")
test.show()
val test_pivot = test.groupBy("ColumnA")
.pivot("ColumnB")
.agg(count("columnC"))
test_pivot.na.fill(0)show(false)
}
和输出
|ColumnA|Complete|Inprogress|Not yet started|
+-------+--------+----------+---------------+
|3 |0 |0 |2 |
|1 |2 |0 |0 |
|2 |0 |2 |0 |
+-------+--------+----------+---------------+