How to drop observations conditionally with same values in R

时间:2017-12-18 07:19:15

标签: r subset

I'm trying to subset dataframe with age condition. However I want it to be conditional on multiple observations.

The dataframe has 10 observations, with variables 'household id', 'household relation', 'age'. 'Household id' is household number that has been uniquely assigned to each house. 'Household realation' is position of a person in the household. '1' means that the person is head of the household. '2' means that he/she is the spouse of that household. 'Age' is age of the person.

    Household_id     Household_relation    Age 
1            2                1            27
2            2                2            34  
3            4                1            22
4            4                2            23
5            7                2            21
6            7                1            29  
7            9                1            33  
8            9                2            34
9           11                1            31
10          11                2            29

So the data is made of couples of each household. I want to drop couples that are both not in 20s. So if one of them are in 20s, they stay(therefore household id 2 stays). But if they are both not in 20s, I want to drop them from the data(for example, household id 9 should be dropped). So the subsetting process should be conditional on two observations each time.

Since my real data has more then 10000 observations, the syntax should be short enough to subset all the data. I tried to do this using 'for' loop, but couldn't figure out how.

How can I do this procedure in R?

below are my reproducible example code.

Household_id <- c(2,2,4,4,7,7,9,9,11,11)
Household_relation <- c(1,2,1,2,2,1,1,2,1,2)
Age <- c(27,34,22,23,21,29,33,34,31,29)
data <- data.frame(Household_id, Household_relation, Age)

2 个答案:

答案 0 :(得分:3)

In dplyr we can use filter to keep the groups that has any of the members in their 20's.

library(dplyr)
data %>%
   group_by(Household_id) %>%
   filter(any(Age >= 20 & Age < 30))

# Household_id  Household_relation   Age
#         <dbl>              <dbl> <dbl>
#1            2                  1    27
#2            2                  2    34
#3            4                  1    22
#4            4                  2    23
#5            7                  2    21
#6            7                  1    29
#7           11                  1    31
#8           11                  2    29

The base R equivalent with ave would be

data[as.logical(ave(data$Age, data$Household_id, FUN = function(x)
                                                  any(x >= 20 & x < 30))), ]

答案 1 :(得分:1)

You can, of course, translate this to "data.table" like:

2017-12-18 07:22:10 ERROR JRWebService:127 - APP: null
2017-12-18 07:22:10 ERROR JRWebService:128 - PATH:/ usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2017-12-18 07:22:10 ERROR JRWebService:129 - JAVA_HOME: null