这是我的数据:
class x1 x2
c 6 90
b 5 50
c 3 70
b 9 40
a 5 30
b 1 60
a 7 20
c 4 80
a 2 10
我首先要按class
(增加或减少并不重要)然后按x1
(减少)排序,所以我执行以下操作:
df <- df[with(df, order(class, x1, decreasing = TRUE))]
class x1 x2
c 6 90
c 4 80
c 3 70
b 9 40
b 5 50
b 1 60
a 7 20
a 5 30
a 2 10
然后我希望每个x1
的累积总和超过class
:
class x1 x2 cumsum
c 6 90 90
c 4 80 170 # 90+80
c 3 70 240 # 90+80+70
b 9 40 40
b 5 50 90 # 40+50
b 1 60 150 # 40+50+60
a 7 20 20
a 5 30 50 # 20+30
a 2 10 60 # 20+30+10
关注this answer,我这样做了:
df$cumsum <- unlist(by(df$x2, df$class, cumsum))
# (Also tried this, same result)
df$cumsum <- unlist(by(df[,x2], df[,class], cumsum))
但我得到的是整个集合的累积总和+错误。更具体地说,这就是我得到的:
class x1 x2 cumsum
c 6 90 20 # this cumsum
c 4 80 50 # and this cumsum
c 3 70 60 # and this cumsum are the cumsum of the lines of class a,
b 9 40 100 # then it adds the 'x2' values of class b : 60 ('cumsum' from the previous line) + 40
b 5 50 150 # and keeps doing so : 100 + 50
b 1 60 210 # 150 + 60
a 7 20 300 # 210 + 90
a 5 30 380 # 300 + 80
a 2 10 450 # 380 + 70
关于如何解决这个问题的任何想法?感谢
答案 0 :(得分:3)
dplyr
也可以在这里工作
library(dplyr)
df %>%
group_by(class) %>%
arrange(desc(x1)) %>%
mutate(cumsum=cumsum(x2))
## class x1 x2 cumsum
## (fctr) (int) (int) (int)
## 1 a 7 20 20
## 2 a 5 30 50
## 3 a 2 10 60
## 4 b 9 40 40
## 5 b 5 50 90
## 6 b 1 60 150
## 7 c 6 90 90
## 8 c 4 80 170
## 9 c 3 70 240
如此处所述(https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html)和其他地方,group_by
与arrange
一起暗示数据将首先按分组变量排序。
答案 1 :(得分:2)
我们可以使用data.table
library(data.table)
setDT(df)[, x2:= cumsum(x2) , class]
df
# class x1 x2
#1: c 6 90
#2: c 4 170
#3: c 3 240
#4: b 9 40
#5: b 5 90
#6: b 1 150
#7: a 7 20
#8: a 5 50
#9: a 2 60
注意:在上面我使用了有序数据
如果我们还需要order
,
setorder(setDT(df), -class, -x1)[, x2:=cumsum(x2), class]
答案 2 :(得分:0)
您可以在transform
列上使用基础R ave
和cumsum
到class
transform(df[order(df$class, decreasing = T), ], cumsum = ave(x2, class, FUN=cumsum))
# class x1 x2 cumsum
#1 c 6 90 90
#3 c 3 70 160
#8 c 4 80 240
#2 b 5 50 50
#4 b 9 40 90
#6 b 1 60 150
#5 a 5 30 30
#7 a 7 20 50
#9 a 2 10 60