这是一个数据框:
ID<-c(rep("A",3),rep("B",2), rep("C",3),rep("D",5))
cell<-c("a1","a2","a3","a1","a2","a1","a2", "a3","a1","a2","a1","a2","a3")
value<-c(2,5,3,4,5,6,9,8,7,2,5,2,4)
df<-as.data.frame(cbind(ID, cell, value))
我想计算每个ID到单元格a2(含)的所有值的总和。必须考虑单元格和ID的顺序。如果在计算总和后没有任何单元格“ a2”,则不应考虑此行。
因此,我想获得此表:
您能帮我编写这种情况吗? 提前致谢。 最好的问候,Inna
答案 0 :(得分:1)
假设文件已经按单元正确排序
library( tidyverse )
df %>%
group_by( ID ) %>%
mutate( value = cumsum( value ) ) %>%
filter( cell == "a2" )
# # A tibble: 5 x 3
# # Groups: ID [4]
# ID cell value
# <chr> <chr> <dbl>
# 1 A a2 7
# 2 B a2 9
# 3 C a2 15
# 4 D a2 9
# 5 D a2 16
答案 1 :(得分:1)
我们可以将"a2"
的每次出现都视为不同的组:
library(dplyr)
df %>%
#Create a group column with every value of cell == 'a2' as different group
group_by(ID, grp = cumsum(lag(cell == 'a2', default = TRUE))) %>%
#Remove those groups that do not have 'a2' in them
filter(any(cell == 'a2')) %>%
#Sum till 'a2' value
summarise(value = sum(value[seq_len(match('a2', cell))]),
cell = last(cell)) %>%
select(-grp)
# ID value cell
# <chr> <dbl> <chr>
#1 A 7 a2
#2 B 9 a2
#3 C 15 a2
#4 D 9 a2
#5 D 7 a2
答案 2 :(得分:1)
使用ave
的简洁解决方案。
r <- transform(df, value=ave(value, ID, FUN=cumsum))[df$cell == "a2", ]
r
# ID cell value
# 2 A a2 7
# 5 B a2 9
# 7 C a2 15
# 10 D a2 9
# 12 D a2 16
答案 3 :(得分:0)
带有data.table
library(data.table)
setDT(df)[, value := cumsum(value) , ID][cell == 'a2']
-输出
# ID cell value
#1: A a2 7
#2: B a2 9
#3: C a2 15
#4: D a2 9
#5: D a2 16