我在pyspark上有一个使用交叉表函数的表,如下所示:
df = sqlContext.createDataFrame( [(1,2,"a"),(3,2,"a"),(1,3,"b"),(2,2,"a"),(2,3,"b")],
["time", "value", "class"] )
tabla = df.crosstab("value","class")
tabla.withColumn("Total",tabla.a + tabla.b).show()
+-----------+---+---+-----+
|value_class| a| b|Total|
+-----------+---+---+-----+
| 2| 4| 0| 4|
| 4| 1| 2| 3|
| 3| 1| 4| 5|
+-----------+---+---+-----+
我需要汇总一个新的列,表示“总计”的累积总和
答案 0 :(得分:0)
希望这会有所帮助:
我刚刚给出了一个例子,但您可以使用partitionBy,orderBy等来创建窗口。
library(ggtern)
a <- data.frame(x=c(0.1,0.9,0),
y=c(0.4,0.2,0.4),
z=c(0.3,0.4,0.3))
b <- data.frame(x=c(0.5,0.5,0),
y=c(0.4,0.4,0.2),
z=c(0.5,0.3,0.2))
df = rbind(a,b)
ggtern(data=df,aes(x,y,z)) +
geom_point()