如何计算满足条件的行数

时间:2018-01-16 01:44:52

标签: r

我有一个股票流动性的数据框df(下面是dput声明),我需要衡量该值之前连续三行的数量> 10:

  date     sec_id  liquidity   good       count.good.rows
2016-07-29   3277  9.142245 FALSE               0
2016-08-31   3277 11.070555  TRUE               0
2016-09-30   3277 11.934113  TRUE               1
2016-10-31   3277 12.192237  TRUE               2
2016-11-30   3277 10.165183  TRUE               3
2016-12-30   3277  8.414033 FALSE               3
2016-01-29   3426  6.494181 FALSE               0
2016-02-29   3426  8.216213 FALSE               0
2016-03-31   3426 10.081115  TRUE               0
2016-04-29   3426 10.119685  TRUE               1
2016-05-31   3426  8.659732 FALSE               2
2016-06-30   3426  6.790178 FALSE               1
2016-07-29   3426  7.234159 FALSE               0

请注意有关数据的一些事项:

  1. 有多个sec_id值,我需要根据sec_id列的顺序对每个data值执行此操作。
  2. 我已经添加了good列,但无法明确使用count.good.rows来了解如何执行lag(...,1) + lag(...,2) + lag(...,3)列。这将是一个糟糕的解决方案,因为我需要3作为变量(我可能最终想要查看前两行或四行)。
  3. 有什么想法吗?

    这是我的dput

    df = structure(list(date = structure(c(16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165), class = "Date"),
        sec_id = c(3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L),
        liquidity = c(4.014428, 3.779665, 4.833813, 5.244417, 7.150838, 7.639399, 9.142245, 11.070555, 11.934113, 12.192237, 10.165183, 8.414033, 6.494181, 8.216213, 10.081115, 10.119685, 8.659732, 6.790178, 7.234159, 8.529101, 9.015898, 8.307979, 8.231237, 8.711095),
        good = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
        class = "data.frame", .Names = c("date", "sec_id", "liquidity", "good"), 
        row.names = c(NA, -24L))
    

2 个答案:

答案 0 :(得分:1)

你可以定义一个lagcounter函数,它将累积和n good的累积和减去:

lagcounter = function(x,n) {y = cumsum(x); lag(y-lag(y,n,default=0),default=0)}

然后使用dplyr语法,在按mutate分组后,在sec_id语句中使用此新定义的函数:

library(dplyr)
df %>% group_by(sec_id) %>% mutate(count.good.rows = lagcounter(good,3)) 

         date sec_id liquidity  good count.good.rows
1  2016-01-29   3277  4.014428 FALSE               0
2  2016-02-29   3277  3.779665 FALSE               0
3  2016-03-31   3277  4.833813 FALSE               0
4  2016-04-29   3277  5.244417 FALSE               0
5  2016-05-31   3277  7.150838 FALSE               0
6  2016-06-30   3277  7.639399 FALSE               0
7  2016-07-29   3277  9.142245 FALSE               0
8  2016-08-31   3277 11.070555  TRUE               0
9  2016-09-30   3277 11.934113  TRUE               1
10 2016-10-31   3277 12.192237  TRUE               2
11 2016-11-30   3277 10.165183  TRUE               3
12 2016-12-30   3277  8.414033 FALSE               3
13 2016-01-29   3426  6.494181 FALSE               0
14 2016-02-29   3426  8.216213 FALSE               0
15 2016-03-31   3426 10.081115  TRUE               0
16 2016-04-29   3426 10.119685  TRUE               1
17 2016-05-31   3426  8.659732 FALSE               2
18 2016-06-30   3426  6.790178 FALSE               2
19 2016-07-29   3426  7.234159 FALSE               1
20 2016-08-31   3426  8.529101 FALSE               0
21 2016-09-30   3426  9.015898 FALSE               0
22 2016-10-31   3426  8.307979 FALSE               0
23 2016-11-30   3426  8.231237 FALSE               0
24 2016-12-30   3426  8.711095 FALSE               0

答案 1 :(得分:1)

尝试zoo::rollapply

library(zoo)
df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(good, 3, sum, align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   1
 # 3 2016-09-30   3277     11.9  T                   2
 # 4 2016-10-31   3277     12.2  T                   3
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   2
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   1
# 10 2016-04-29   3426     10.1  T                   2
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   1
# 13 2016-07-29   3426      7.23 F                   0

编辑如果您只想计算之前三行

df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(dplyr::lag(good, 1), 3, function(i) sum(i, na.rm=TRUE), align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   # <fctr>      <int>     <dbl> <lgl>           <int>
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   0
 # 3 2016-09-30   3277     11.9  T                   1
 # 4 2016-10-31   3277     12.2  T                   2
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   3
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   0
# 10 2016-04-29   3426     10.1  T                   1
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   2
# 13 2016-07-29   3426      7.23 F                   1

数据

df <- read.table(text="date     sec_id  liquidity   good     
2016-07-29   3277  9.142245 FALSE 
2016-08-31   3277 11.070555  TRUE  
2016-09-30   3277 11.934113  TRUE  
2016-10-31   3277 12.192237  TRUE  
2016-11-30   3277 10.165183  TRUE  
2016-12-30   3277  8.414033 FALSE 
2016-01-29   3426  6.494181 FALSE 
2016-02-29   3426  8.216213 FALSE 
2016-03-31   3426 10.081115  TRUE  
2016-04-29   3426 10.119685  TRUE  
2016-05-31   3426  8.659732 FALSE 
2016-06-30   3426  6.790178 FALSE  
2016-07-29   3426  7.234159 FALSE  ", header=TRUE)