df1 <-
data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
Topic=c("1","2","3","3","5","5"),
Frequency=c(1,2,5,2,3,2))
df1
df2 <-
data.frame(Sector=c("auto","auto","auto"),
Topic=c("1","2","3"),
Frequency=c(1,2,5))
df2
我有上面的数据帧1(df1),并希望它的条件子集看起来像df2。条件如下:
“如果相应扇区的至少一次观测频率大于3,则应保留对该扇区的所有观察,如果不是,则应删除相应扇区的所有观测值。” 在这个例子中,只剩下汽车行业的三个观察结果,即行业被抛弃。
有人知道我可以达到目标子集的条件吗?
答案 0 :(得分:2)
我们可以使用dplyr中的group_by
和filter
来实现这一目标。
library(dplyr)
df2 <- df1 %>%
group_by(Sector) %>%
filter(any(Frequency > 3)) %>%
ungroup()
df2
# # A tibble: 3 x 3
# Sector Topic Frequency
# <fct> <fct> <dbl>
# 1 auto 1 1.
# 2 auto 2 2.
# 3 auto 3 5.
答案 1 :(得分:2)
以下是基础df1 <-
data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
Topic=c("1","2","3","3","5","5"),
Frequency=c(1,2,5,2,3,2))
subset(df1, ave(Frequency, Sector, FUN=max) >3)
的解决方案:
data.table
以及library("data.table")
setDT(df1)[, if (max(Frequency)>3) .SD, by=Sector]
的解决方案:
{{1}}