structure(list(Stock = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L), .Label = c("AAA", "BBB", "CCC", "DDD"), class = "factor"),
Date = structure(c(17632, 17633, 17634, 17632, 17633, 17634,
17632, 17633, 17634, 17632, 17633, 17634), class = "Date"),
Price = c(5L, 6L, 7L, 10L, 9L, 9L, 6L, 6L, 6L, 10L, 30L,
50L), Market.Cap = c(1000L, 1300L, 1600L, 1600L, 1000L, 1000L,
600L, 600L, 600L, 400L, 1000L, 2000L)), .Names = c("Stock",
"Date", "Price", "Market.Cap"), row.names = c(NA, -12L), class = "data.frame")
我正在尝试根据特定日期的市值过滤此示例。例如,我试图删除股票BBB,因为它是2018年4月11日市值唯一超过1500的股票。但是,如果市值增长到1500以上(例如AAA和DDD),则库存可以保留在数据框中。我尝试了dplyr,但无法提出组合来执行此操作。
答案 0 :(得分:1)
您可以对filter
分组。诀窍是要意识到a grouped filter is really a grouped mutate then an ungrouped filter。要了解为什么,请考虑最后给出相同结果的可比较代码。我们只需要查看每个股票中的any
行是否都有特定的日期和过多的市值。
df <- structure(list(Stock = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("AAA", "BBB", "CCC", "DDD"), class = "factor"), Date = structure(c(17632, 17633, 17634, 17632, 17633, 17634, 17632, 17633, 17634, 17632, 17633, 17634), class = "Date"), Price = c(5L, 6L, 7L, 10L, 9L, 9L, 6L, 6L, 6L, 10L, 30L, 50L), Market.Cap = c(1000L, 1300L, 1600L, 1600L, 1000L, 1000L, 600L, 600L, 600L, 400L, 1000L, 2000L)), .Names = c("Stock", "Date", "Price", "Market.Cap"), row.names = c(NA, -12L), class = "data.frame")
library(tidyverse)
df %>%
group_by(Stock) %>%
filter(!any(Market.Cap > 1500 & Date == as.Date("2018-04-11")))
#> # A tibble: 9 x 4
#> # Groups: Stock [3]
#> Stock Date Price Market.Cap
#> <fct> <date> <int> <int>
#> 1 AAA 2018-04-11 5 1000
#> 2 AAA 2018-04-12 6 1300
#> 3 AAA 2018-04-13 7 1600
#> 4 CCC 2018-04-11 6 600
#> 5 CCC 2018-04-12 6 600
#> 6 CCC 2018-04-13 6 600
#> 7 DDD 2018-04-11 10 400
#> 8 DDD 2018-04-12 30 1000
#> 9 DDD 2018-04-13 50 2000
df %>%
group_by(Stock) %>%
mutate(keep = !any(Market.Cap > 1500 & Date == as.Date("2018-04-11"))) %>%
filter(keep == TRUE)
由reprex package(v0.2.0)于2018-07-20创建。
答案 1 :(得分:0)
编辑: :如果OP仅在特定日期寻找过滤条件,并且最后一列的值大于1500,则可能会有所帮助。
subset(df122 , Market.Cap < 1500 | Date == as.Date("2018-04-11"))
仅使用基础R
tat <- apply(df122[4], 1, function(val) any(val < 1500))
df122[ tat , ]
您的结构存储在df122
中的位置