假设我有一个数据框df
> df
ID Year Weight
1 Brown 1960 5.0
2 Green 1990 3.0
3 Yellow 1961 4.8
4 Green 1994 7.0
5 Green 1993 6.0
6 Brown 1964 8.0
7 Yellow 1960 4.6
如果我想对权重大于或等于5的所有ID进行子集化,我会简单地说:
> df[df$Weight >= 5, ]
ID Year Weight
1 Brown 1960 5
4 Green 1994 7
5 Green 1993 6
6 Brown 1964 8
不幸的是,由于权重小于5,因此1990年以下的绿色被排除在外了。有没有一种方法可以保留所有ID,只要它们的权重之一大于或等于5? / strong>
所需的输出
> output
ID Year Weight
1 Green 1990 3
2 Green 1993 6
3 Green 1994 7
4 Brown 1960 5
5 Brown 1964 8
非常感谢!
答案 0 :(得分:1)
我们可以在此处使用dplyr
,并且每个ID
仅保留行,以使该组的一个成员的权重为5或更高:
temp <- df %>%
group_by(ID) %>%
mutate(Min_Weight = max(Weight))
output <- temp[temp$Min_Weight >= 5, ]
output[order(output$ID), ]
ID Year Weight Min_Weight
<chr> <dbl> <dbl> <dbl>
1 Brown 1960 5 8
2 Brown 1964 8 8
3 Green 1990 3 7
4 Green 1994 7 7
5 Green 1993 6 7
数据:
df <- data.frame(ID=c("Brown", "Green", "Yellow", "Green", "Green", "Brown", "Yellow"),
Year=c(1960, 1990, 1961, 1994, 1993, 1964, 1960),
Weight=c(5.0, 3.0, 4.8, 7.0, 6.0, 8.0, 4.6), stringsAsFactors=FALSE)
答案 1 :(得分:1)
使用dplyr
,我们可以group_by
ID
并使用filter
library(dplyr)
df %>% group_by(ID) %>% filter(any(Weight > 5))
# ID Year Weight
# <chr> <dbl> <dbl>
#1 Brown 1960 5
#2 Green 1990 3
#3 Green 1994 7
#4 Green 1993 6
#5 Brown 1964 8
或与data.table
library(data.table)
setDT(df)
df[, .SD[any(Weight > 5)], ID]
答案 2 :(得分:1)
转换为data.table
:
> library(data.table)
> setDT(df)
> df[ID %in% df[Weight>5, ID]]
ID Year Weight
1: Brown 1960 5
2: Green 1990 3
3: Green 1994 7
4: Green 1993 6
5: Brown 1964 8
答案 3 :(得分:0)
这是具有ave()
和subset()
的基本R解决方案
dfout <- subset(df, as.logical(with(df,ave(Weight, ID, FUN = function(x) any(x>=5)))))
如此
> dfout
ID Year Weight
1 Brown 1960 5
2 Green 1990 3
4 Green 1994 7
5 Green 1993 6
6 Brown 1964 8