根据开始日期和值的组合过滤某些ID变量

时间:2018-07-20 23:08:35

标签: r dplyr

structure(list(Stock = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L), .Label = c("AAA", "BBB", "CCC", "DDD"), class =     "factor"), 
    Date = structure(c(17632, 17633, 17634, 17632, 17633, 17634, 
    17632, 17633, 17634, 17632, 17633, 17634), class = "Date"), 
    Price = c(5L, 6L, 7L, 10L, 9L, 9L, 6L, 6L, 6L, 10L, 30L, 
    50L), Market.Cap = c(1000L, 1300L, 1600L, 1600L, 1000L, 1000L, 
    600L, 600L, 600L, 400L, 1000L, 2000L)), .Names = c("Stock", 
"Date", "Price", "Market.Cap"), row.names = c(NA, -12L), class =     "data.frame")

我正在尝试根据特定日期的市值过滤此示例。例如,我试图删除股票BBB,因为它是2018年4月11日市值唯一超过1500的股票。但是,如果市值增长到1500以上(例如AAA和DDD),则库存可以保留在数据框中。我尝试了dplyr,但无法提出组合来执行此操作。

2 个答案:

答案 0 :(得分:1)

您可以对filter分组。诀窍是要意识到a grouped filter is really a grouped mutate then an ungrouped filter。要了解为什么,请考虑最后给出相同结果的可比较代码。我们只需要查看每个股票中的any行是否都有特定的日期和过多的市值。

df <- structure(list(Stock = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("AAA", "BBB", "CCC", "DDD"), class = "factor"), Date = structure(c(17632, 17633, 17634, 17632, 17633, 17634, 17632, 17633, 17634, 17632, 17633, 17634), class = "Date"), Price = c(5L, 6L, 7L, 10L, 9L, 9L, 6L, 6L, 6L, 10L, 30L, 50L), Market.Cap = c(1000L, 1300L, 1600L, 1600L, 1000L, 1000L, 600L, 600L, 600L, 400L, 1000L, 2000L)), .Names = c("Stock", "Date", "Price", "Market.Cap"), row.names = c(NA, -12L), class = "data.frame")

library(tidyverse)
df %>%
  group_by(Stock) %>%
  filter(!any(Market.Cap > 1500 & Date == as.Date("2018-04-11")))
#> # A tibble: 9 x 4
#> # Groups:   Stock [3]
#>   Stock Date       Price Market.Cap
#>   <fct> <date>     <int>      <int>
#> 1 AAA   2018-04-11     5       1000
#> 2 AAA   2018-04-12     6       1300
#> 3 AAA   2018-04-13     7       1600
#> 4 CCC   2018-04-11     6        600
#> 5 CCC   2018-04-12     6        600
#> 6 CCC   2018-04-13     6        600
#> 7 DDD   2018-04-11    10        400
#> 8 DDD   2018-04-12    30       1000
#> 9 DDD   2018-04-13    50       2000

df %>%
  group_by(Stock) %>%
  mutate(keep = !any(Market.Cap > 1500 & Date == as.Date("2018-04-11"))) %>%
  filter(keep == TRUE)

reprex package(v0.2.0)于2018-07-20创建。

答案 1 :(得分:0)

编辑: :如果OP仅在特定日期寻找过滤条件,并且最后一列的值大于1500,则可能会有所帮助。

subset(df122 , Market.Cap < 1500 | Date == as.Date("2018-04-11"))

仅使用基础R

tat <- apply(df122[4], 1, function(val) any(val < 1500))
df122[ tat , ]

您的结构存储在df122中的位置