如何基于两个条件对数据进行子集

时间:2019-12-25 07:56:34

标签: r subset threshold

假设我有一个数据框df

> df
      ID Year Weight
1  Brown 1960    5.0
2  Green 1990    3.0
3 Yellow 1961    4.8
4  Green 1994    7.0
5  Green 1993    6.0
6  Brown 1964    8.0
7 Yellow 1960    4.6

如果我想对权重大于或等于5的所有ID进行子集化,我会简单地说:

> df[df$Weight >= 5, ]
     ID Year Weight
1 Brown 1960      5
4 Green 1994      7
5 Green 1993      6
6 Brown 1964      8

不幸的是,由于权重小于5,因此1990年以下的绿色被排除在外了。有没有一种方法可以保留所有ID,只要它们的权重之一大于或等于5? / strong>

所需的输出

> output
     ID Year Weight
1 Green 1990      3
2 Green 1993      6
3 Green 1994      7
4 Brown 1960      5
5 Brown 1964      8

非常感谢!

4 个答案:

答案 0 :(得分:1)

我们可以在此处使用dplyr,并且每个ID仅保留行,以使该组的一个成员的权重为5或更高:

temp <- df %>%
    group_by(ID) %>%
    mutate(Min_Weight = max(Weight))

output <- temp[temp$Min_Weight >= 5, ]
output[order(output$ID), ]

  ID     Year Weight Min_Weight
  <chr> <dbl>  <dbl>      <dbl>
1 Brown  1960      5          8
2 Brown  1964      8          8
3 Green  1990      3          7
4 Green  1994      7          7
5 Green  1993      6          7

数据:

df <- data.frame(ID=c("Brown", "Green", "Yellow", "Green", "Green", "Brown", "Yellow"),
                 Year=c(1960, 1990, 1961, 1994, 1993, 1964, 1960),
                 Weight=c(5.0, 3.0, 4.8, 7.0, 6.0, 8.0, 4.6), stringsAsFactors=FALSE)

答案 1 :(得分:1)

使用dplyr,我们可以group_by ID并使用filter

library(dplyr)
df %>% group_by(ID) %>% filter(any(Weight > 5))

#   ID     Year Weight
#  <chr> <dbl>  <dbl>
#1 Brown  1960      5
#2 Green  1990      3
#3 Green  1994      7
#4 Green  1993      6
#5 Brown  1964      8

或与data.table

library(data.table)

setDT(df)
df[, .SD[any(Weight > 5)], ID]

答案 2 :(得分:1)

转换为data.table

> library(data.table)
> setDT(df)

> df[ID %in% df[Weight>5, ID]]
      ID Year Weight
1: Brown 1960      5
2: Green 1990      3
3: Green 1994      7
4: Green 1993      6
5: Brown 1964      8

答案 3 :(得分:0)

这是具有ave()subset()的基本R解决方案

dfout <- subset(df, as.logical(with(df,ave(Weight, ID, FUN = function(x) any(x>=5)))))

如此

> dfout
     ID Year Weight
1 Brown 1960      5
2 Green 1990      3
4 Green 1994      7
5 Green 1993      6
6 Brown 1964      8