我有一个df,其中列A是唯一标识符,B实际上是累积分组,始终从0开始,C是总数。希望D列是基于B的C的累积总数。
例如,开始于:
A = 1:20
B = c(0,1,2,3,4,0,1,0,1,2,3,4,5,6,7,8,0,1,2,3)
C = c(0,4,1,7,0,1,2,5,4,3,2,1,4,8,7,2,1,2,3,4)
test = data.frame(A, B, C)
测试df:
A B C
1 1 0 0
2 2 1 4
3 3 2 1
4 4 3 7
5 5 4 0
6 6 0 1
7 7 1 2
8 8 0 5
9 9 1 4
10 10 2 3
11 11 3 2
12 12 4 1
13 13 5 4
14 14 6 8
15 15 7 7
16 16 8 2
17 17 0 1
18 18 1 2
19 19 2 3
20 20 3 4
想要显示一列:
A B C total
1 1 0 0 0
2 2 1 4 4
3 3 2 1 5
4 4 3 7 12
5 5 4 0 12
6 6 0 1 1
7 7 1 2 3
8 8 0 5 5
9 9 1 4 9
10 10 2 3 12
11 11 3 2 14
12 12 4 1 15
13 13 5 4 19
14 14 6 8 27
15 15 7 7 34
16 16 8 2 36
17 17 0 1 1
18 18 1 2 3
19 19 2 3 6
20 20 3 4 10
我已经尝试了各种for和while循环,但是无法使其正常工作:
test$total <- 0
for (i in test$A) {
if(test$B == 0) {
test$total <- test$B
} else {
test[i,4] <- test[i,3] + test[(i-1), 2]
}
}
答案 0 :(得分:1)
您可以使用dplyr
:
test %>%
group_by(id=cumsum(B==0)) %>%
mutate(D = cumsum(C)) %>%
ungroup %>%
select(-id)
返回
# A tibble: 20 x 4
A B C D
<int> <dbl> <dbl> <dbl>
1 1 0 0 0
2 2 1 4 4
3 3 2 1 5
4 4 3 7 12
5 5 4 0 12
6 6 0 1 1
7 7 1 2 3
8 8 0 5 5
9 9 1 4 9
10 10 2 3 12
11 11 3 2 14
12 12 4 1 15
13 13 5 4 19
14 14 6 8 27
15 15 7 7 34
16 16 8 2 36
17 17 0 1 1
18 18 1 2 3
19 19 2 3 6
20 20 3 4 10
答案 1 :(得分:1)
我看到已经有一个可以接受的答案。我只是想表明可以在Base-R
中相对容易地做到这一点。
test$total <- unlist(tapply(test$C,cumsum(test$B==0),cumsum))
A B C total
1 1 0 0 0
2 2 1 4 4
3 3 2 1 5
4 4 3 7 12
5 5 4 0 12
6 6 0 1 1
7 7 1 2 3
8 8 0 5 5
9 9 1 4 9
10 10 2 3 12
11 11 3 2 14
12 12 4 1 15
13 13 5 4 19
14 14 6 8 27
15 15 7 7 34
16 16 8 2 36
17 17 0 1 1
18 18 1 2 3
19 19 2 3 6
20 20 3 4 10
答案 2 :(得分:0)
您可以尝试以下方法:
library(zoo)
library(dplyr)
#Data
DF <- structure(list(V1 = 1:20, A = 1:20, B = c(0L, 1L, 2L, 3L, 4L,
0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 0L, 1L, 2L, 3L),
C = c(0L, 4L, 1L, 7L, 0L, 1L, 2L, 5L, 4L, 3L, 2L, 1L, 4L,
8L, 7L, 2L, 1L, 2L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-20L))
#Create index
index <- which(DF$B==0)
#Create val
val <- letters[1:length(index)]
#Create empty var
DF$I <- NA
#Assign
DF$I[index]<-val
#Fill
DF$I <- na.locf(DF$I)
#Mutate
DF %>% group_by(I) %>% mutate(D=cumsum(C)) %>% ungroup() %>% select(-4) -> DF1
# A tibble: 20 x 4
A B C D
<int> <int> <int> <int>
1 1 0 0 0
2 2 1 4 4
3 3 2 1 5
4 4 3 7 12
5 5 4 0 12
6 6 0 1 1
7 7 1 2 3
8 8 0 5 5
9 9 1 4 9
10 10 2 3 12
11 11 3 2 14
12 12 4 1 15
13 13 5 4 19
14 14 6 8 27
15 15 7 7 34
16 16 8 2 36
17 17 0 1 1
18 18 1 2 3
19 19 2 3 6
20 20 3 4 10