我有一个如下数据框
A B C D
foo one small 1
foo one large 2
foo one large 2
foo two small 3
我需要根据A和B groupBy
(基于C列和pivot
D列)来sum
我可以使用
df.groupBy("A", "B").pivot("C").sum("D")
但是,如果我尝试类似的操作,我还需要在count
之后找到groupBy
df.groupBy("A", "B").pivot("C").agg(sum("D"), count)
我得到类似
的输出A B large small large_count small_count
有没有办法在执行count
之前在groupBy
之后只获得一个pivot
答案 0 :(得分:0)
在输出时尝试
output.withColumn(“ count”,$“ large_count” + $“ small_count”)。show
如果需要,您可以删除两个计数列
在枢轴尝试之前执行此操作 df.groupBy(“ A”,“ B”)。agg(count(“ C”))
答案 1 :(得分:0)
这是您的期望吗?
val df = Seq(("foo", "one", "small", 1),
("foo", "one", "large", 2),
("foo", "one", "large", 2),
("foo", "two", "small", 3)).toDF("A","B","C","D")
scala> df.show
+---+---+-----+---+
| A| B| C| D|
+---+---+-----+---+
|foo|one|small| 1|
|foo|one|large| 2|
|foo|one|large| 2|
|foo|two|small| 3|
+---+---+-----+---+
scala> val df2 = df.groupBy('A,'B).pivot("C").sum("D")
df2: org.apache.spark.sql.DataFrame = [A: string, B: string ... 2 more fields]
scala> val df3 = df.groupBy('A as "A1",'B as "B1").agg(sum('D) as "sumd")
df3: org.apache.spark.sql.DataFrame = [A1: string, B1: string ... 1 more field]
scala> df3.join(df2,'A==='A1 and 'B==='B1,"inner").select("A","B","sumd","large","small").show
+---+---+----+-----+-----+
| A| B|sumd|large|small|
+---+---+----+-----+-----+
|foo|one| 5| 4| 1|
|foo|two| 3| null| 3|
+---+---+----+-----+-----+
scala>
答案 2 :(得分:0)
这不需要加入。这是您要找的东西吗?
val df = Seq(("foo", "one", "small", 1),
("foo", "one", "large", 2),
("foo", "one", "large", 2),
("foo", "two", "small", 3)).toDF("A","B","C","D")
scala> df.show
+---+---+-----+---+
| A| B| C| D|
+---+---+-----+---+
|foo|one|small| 1|
|foo|one|large| 2|
|foo|one|large| 2|
|foo|two|small| 3|
+---+---+-----+---+
df.registerTempTable("dummy")
spark.sql("SELECT * FROM (SELECT A , B , C , sum(D) as D from dummy group by A,B,C grouping sets ((A,B,C) ,(A,B)) order by A nulls last , B nulls last , C nulls last) dummy pivot (first(D) for C in ('large' large ,'small' small , null total))").show
+---+---+-----+-----+-----+
| A| B|large|small|total|
+---+---+-----+-----+-----+
|foo|one| 4| 1| 5|
|foo|two| null| 3| 3|
+---+---+-----+-----+-----+