特殊条件下数据帧的条件子集

时间:2018-03-13 14:22:45

标签: r dataframe conditional subset

 df1 <-
 data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
 Topic=c("1","2","3","3","5","5"), 
 Frequency=c(1,2,5,2,3,2))
 df1

 df2 <- 
 data.frame(Sector=c("auto","auto","auto"),
 Topic=c("1","2","3"), 
 Frequency=c(1,2,5))
 df2

我有上面的数据帧1(df1),并希望它的条件子集看起来像df2。条件如下:

“如果相应扇区的至少一次观测频率大于3,则应保留对该扇区的所有观察,如果不是,则应删除相应扇区的所有观测值。” 在这个例子中,只剩下汽车行业的三个观察结果,即行业被抛弃。

有人知道我可以达到目标子集的条件吗?

2 个答案:

答案 0 :(得分:2)

我们可以使用中的group_byfilter来实现这一目标。

library(dplyr)

df2 <- df1 %>%
  group_by(Sector) %>%
  filter(any(Frequency > 3)) %>%
  ungroup()
df2
# # A tibble: 3 x 3
#   Sector Topic Frequency
#   <fct>  <fct>     <dbl>
# 1 auto   1            1.
# 2 auto   2            2.
# 3 auto   3            5.

答案 1 :(得分:2)

以下是基础df1 <- data.frame(Sector=c("auto","auto","auto","industry","industry","industry"), Topic=c("1","2","3","3","5","5"), Frequency=c(1,2,5,2,3,2)) subset(df1, ave(Frequency, Sector, FUN=max) >3) 的解决方案:

data.table

以及library("data.table") setDT(df1)[, if (max(Frequency)>3) .SD, by=Sector] 的解决方案:

{{1}}