我有一个数据框,按groupID
和date
排序:
d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(1,1,25,1,1,25,1,25,1))
> d1
groupID date value
1 1 1
1 2 1
1 3 25
1 4 1
1 5 1
3 6 25
3 7 1
3 8 25
3 9 1
我想创建两个新列:
期望的输出:
groupID date value Prev1s After1s
1 1 1
1 2 1
1 3 25 2 2
1 4 1
1 5 1
3 6 25 0 1
3 7 1
3 8 25 1 1
3 9 1
我可以通过创建计数器并获取之前的值来使用Excel。我尝试使用sum
,shift()
在R中实现相同功能但是徒劳无功。
答案 0 :(得分:1)
您可以使用dplyr
...
library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
AfterGp=cumsum(value==25)) %>%
#use these to calculate the values you want for each group
group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>%
group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>%
ungroup() %>%
#remove values (set to "") other than for value==25
mutate(Prev1s=replace(Prev1s,value!=25,""),
After1s=replace(After1s,value!=25,"")) %>%
#and remove the grouping variables
select(-c(PrevGp,AfterGp))
d1
# A tibble: 9 x 5
groupID date value Prev1s After1s
<dbl> <dbl> <dbl> <chr> <chr>
1 1 1 1
2 1 2 1
3 1 3 25 2 2
4 1 4 1
5 1 5 1
6 3 6 25 0 1
7 3 7 1
8 3 8 25 1 1
9 3 9 1
答案 1 :(得分:0)
将data.table
- 包与rle
- 函数结合使用的替代方法:
library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
i <- p$values == 25;
p$values[i] <- shift(p$lengths, fill = 0)[i];
a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
p$values[!i] <- a$values[!i] <- NA;
list(inverse.rle(p),inverse.rle(a))},
by = groupID][]
给出:
groupID date value prev1s after1s 1: 1 1 1 NA NA 2: 1 2 1 NA NA 3: 1 3 25 2 2 4: 1 4 1 NA NA 5: 1 5 1 NA NA 6: 3 6 25 0 1 7: 3 7 1 NA NA 8: 3 8 25 1 1 9: 3 9 1 NA NA