我的数据框如下:
df <- data.frame(
Item=c("A","A","A","A","A","B","B","B","B","B"),
Date=c("2018-1-1","2018-2-1","2018-3-1","2018-4-1","2018-5-1","2018-1-1","2018-2-1",
"2018-3-1","2018-4-1","2018-5-1"),
Value=rnorm(10))
我想对按Item分组的新列进行突变,以计算3窗口(或我指定的任何其他整数)中大于0的值的数量。
我对tidyverse
很熟悉,因此,最欢迎使用dplyr
解决方案。
答案 0 :(得分:3)
如果要滚动任何东西,请考虑使用zoo::
软件包。
df$new<-
zoo::rollsum( df$Value > 0, 3, fill = NA )
# Item Date Value new
#1 A 2018-1-1 0.5852699 NA
#2 A 2018-2-1 -0.7383377 1
#3 A 2018-3-1 -0.3157693 1
#4 A 2018-4-1 1.2475237 1
#5 A 2018-5-1 -1.5479757 1
#6 B 2018-1-1 -0.6913331 0
#7 B 2018-2-1 -0.2423809 0
#8 B 2018-3-1 -1.6363024 0
#9 B 2018-4-1 -0.3256263 1
#10 B 2018-5-1 0.3563144 NA
您可以选择“窗口位置”。仔细研究参数align = c("center", "left", "right")
。
作为dplyr链:
df %>% group_by(Item) %>% dplyr::mutate( new = zoo::rollsum( Value > 0, 3, fill = NA ))
答案 1 :(得分:1)
您可以使用RcppRoll
软件包。
require(RcppRoll)
df$new <- df$new <- RcppRoll::roll_sum(df$Value > 0, 3, fill = NA)
使用Tidyverse:
df %>%
group_by(Item) %>%
dplyr::mutate(new = RcppRoll::roll_sum(Value > 0, 3, fill = NA))
从速度上看,它比zoo
软件包要快:
n <- 10000
df <- data.frame(
Item = sample(LETTERS, n, replace = TRUE),
Value = rnorm(n))
df_grouped <- df %>%
group_by(Item)
microbenchmark::microbenchmark(
RcppRoll = df_grouped <- df_grouped %>% dplyr::mutate(new_RcppRoll = RcppRoll::roll_sum(Value > 0, 3, fill = NA)),
zoo = df_grouped <- df_grouped %>% dplyr::mutate(new_zoo = zoo::rollsum( Value > 0, 3, fill = NA ))
)
结果:
Unit: milliseconds
expr min lq mean median uq max neval
RcppRoll 2.509003 2.741993 2.929227 2.83913 2.983726 5.832962 100
zoo 11.172920 11.785113 13.288970 12.43320 13.607826 25.879754 100
和
all.equal(df_grouped$new_RcppRoll, df_grouped$new_zoo)
TRUE
答案 2 :(得分:0)
Item Date Value
<fct> <date> <int>
1 A 2018-01-01 3
2 B 2018-01-01 2
3 B 2018-02-01 -5
4 A 2018-02-01 -3
5 A 2018-03-01 4
6 B 2018-03-01 -2
7 A 2018-04-01 5
8 B 2018-04-01 0
9 A 2018-05-01 1
10 B 2018-05-01 -4
为清晰起见,更改了rmrm示例,使用了示例(-5:5):
> df <- df %>% mutate(greater_than = (Value>0)*Value) %>%
group_by(Item) %>% arrange(Date) %>% mutate(greater_than =
zoo::rollapplyr(greater_than, 3, sum, partial = T))
df %>% arrange(Item) %>% head(10)
应如下所示:
1 A 2018-01-01 3 3
2 A 2018-02-01 -3 3
3 A 2018-03-01 4 7
4 A 2018-04-01 5 9
5 A 2018-05-01 1 10
6 B 2018-01-01 2 2
7 B 2018-02-01 -5 2
8 B 2018-03-01 -2 2
9 B 2018-04-01 0 0
10 B 2018-05-01 -4 0