我有一个股票流动性的数据框df
(下面是dput
声明),我需要衡量该值之前连续三行的数量> 10:
date sec_id liquidity good count.good.rows
2016-07-29 3277 9.142245 FALSE 0
2016-08-31 3277 11.070555 TRUE 0
2016-09-30 3277 11.934113 TRUE 1
2016-10-31 3277 12.192237 TRUE 2
2016-11-30 3277 10.165183 TRUE 3
2016-12-30 3277 8.414033 FALSE 3
2016-01-29 3426 6.494181 FALSE 0
2016-02-29 3426 8.216213 FALSE 0
2016-03-31 3426 10.081115 TRUE 0
2016-04-29 3426 10.119685 TRUE 1
2016-05-31 3426 8.659732 FALSE 2
2016-06-30 3426 6.790178 FALSE 1
2016-07-29 3426 7.234159 FALSE 0
请注意有关数据的一些事项:
sec_id
值,我需要根据sec_id
列的顺序对每个data
值执行此操作。good
列,但无法明确使用count.good.rows
来了解如何执行lag(...,1) + lag(...,2) + lag(...,3)
列。这将是一个糟糕的解决方案,因为我需要3
作为变量(我可能最终想要查看前两行或四行)。有什么想法吗?
这是我的dput
:
df = structure(list(date = structure(c(16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165), class = "Date"),
sec_id = c(3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L),
liquidity = c(4.014428, 3.779665, 4.833813, 5.244417, 7.150838, 7.639399, 9.142245, 11.070555, 11.934113, 12.192237, 10.165183, 8.414033, 6.494181, 8.216213, 10.081115, 10.119685, 8.659732, 6.790178, 7.234159, 8.529101, 9.015898, 8.307979, 8.231237, 8.711095),
good = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
class = "data.frame", .Names = c("date", "sec_id", "liquidity", "good"),
row.names = c(NA, -24L))
答案 0 :(得分:1)
你可以定义一个lagcounter函数,它将累积和n good
的累积和减去:
lagcounter = function(x,n) {y = cumsum(x); lag(y-lag(y,n,default=0),default=0)}
然后使用dplyr
语法,在按mutate
分组后,在sec_id
语句中使用此新定义的函数:
library(dplyr)
df %>% group_by(sec_id) %>% mutate(count.good.rows = lagcounter(good,3))
date sec_id liquidity good count.good.rows
1 2016-01-29 3277 4.014428 FALSE 0
2 2016-02-29 3277 3.779665 FALSE 0
3 2016-03-31 3277 4.833813 FALSE 0
4 2016-04-29 3277 5.244417 FALSE 0
5 2016-05-31 3277 7.150838 FALSE 0
6 2016-06-30 3277 7.639399 FALSE 0
7 2016-07-29 3277 9.142245 FALSE 0
8 2016-08-31 3277 11.070555 TRUE 0
9 2016-09-30 3277 11.934113 TRUE 1
10 2016-10-31 3277 12.192237 TRUE 2
11 2016-11-30 3277 10.165183 TRUE 3
12 2016-12-30 3277 8.414033 FALSE 3
13 2016-01-29 3426 6.494181 FALSE 0
14 2016-02-29 3426 8.216213 FALSE 0
15 2016-03-31 3426 10.081115 TRUE 0
16 2016-04-29 3426 10.119685 TRUE 1
17 2016-05-31 3426 8.659732 FALSE 2
18 2016-06-30 3426 6.790178 FALSE 2
19 2016-07-29 3426 7.234159 FALSE 1
20 2016-08-31 3426 8.529101 FALSE 0
21 2016-09-30 3426 9.015898 FALSE 0
22 2016-10-31 3426 8.307979 FALSE 0
23 2016-11-30 3426 8.231237 FALSE 0
24 2016-12-30 3426 8.711095 FALSE 0
答案 1 :(得分:1)
尝试zoo::rollapply
library(zoo)
df %>%
group_by(sec_id) %>%
mutate(count_good_rows = rollapply(good, 3, sum, align="right", partial=TRUE))
# A tibble: 13 x 5
# Groups: sec_id [2]
# date sec_id liquidity good count_good_rows
# <fctr> <int> <dbl> <lgl> <int>
# 1 2016-07-29 3277 9.14 F 0
# 2 2016-08-31 3277 11.1 T 1
# 3 2016-09-30 3277 11.9 T 2
# 4 2016-10-31 3277 12.2 T 3
# 5 2016-11-30 3277 10.2 T 3
# 6 2016-12-30 3277 8.41 F 2
# 7 2016-01-29 3426 6.49 F 0
# 8 2016-02-29 3426 8.22 F 0
# 9 2016-03-31 3426 10.1 T 1
# 10 2016-04-29 3426 10.1 T 2
# 11 2016-05-31 3426 8.66 F 2
# 12 2016-06-30 3426 6.79 F 1
# 13 2016-07-29 3426 7.23 F 0
编辑如果您只想计算之前三行
df %>%
group_by(sec_id) %>%
mutate(count_good_rows = rollapply(dplyr::lag(good, 1), 3, function(i) sum(i, na.rm=TRUE), align="right", partial=TRUE))
# A tibble: 13 x 5
# Groups: sec_id [2]
# date sec_id liquidity good count_good_rows
# <fctr> <int> <dbl> <lgl> <int>
# 1 2016-07-29 3277 9.14 F 0
# 2 2016-08-31 3277 11.1 T 0
# 3 2016-09-30 3277 11.9 T 1
# 4 2016-10-31 3277 12.2 T 2
# 5 2016-11-30 3277 10.2 T 3
# 6 2016-12-30 3277 8.41 F 3
# 7 2016-01-29 3426 6.49 F 0
# 8 2016-02-29 3426 8.22 F 0
# 9 2016-03-31 3426 10.1 T 0
# 10 2016-04-29 3426 10.1 T 1
# 11 2016-05-31 3426 8.66 F 2
# 12 2016-06-30 3426 6.79 F 2
# 13 2016-07-29 3426 7.23 F 1
数据
df <- read.table(text="date sec_id liquidity good
2016-07-29 3277 9.142245 FALSE
2016-08-31 3277 11.070555 TRUE
2016-09-30 3277 11.934113 TRUE
2016-10-31 3277 12.192237 TRUE
2016-11-30 3277 10.165183 TRUE
2016-12-30 3277 8.414033 FALSE
2016-01-29 3426 6.494181 FALSE
2016-02-29 3426 8.216213 FALSE
2016-03-31 3426 10.081115 TRUE
2016-04-29 3426 10.119685 TRUE
2016-05-31 3426 8.659732 FALSE
2016-06-30 3426 6.790178 FALSE
2016-07-29 3426 7.234159 FALSE ", header=TRUE)