子集数据框仅包含一个因子的级别,该因子在另一个因子

时间:2017-07-07 01:48:36

标签: r dataframe subset r-factor

我正在使用处理数字测量的数据框。有些人已被测量过几次,无论是青少年还是成年人。 一个可重复的例子:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

我的目标是通过选择至少出现一次作为少年并且至少作为一个成年人出现一次的ID来对该数据框进行分组。不知道该怎么做..?

结果数据框将包含个体a1,a2和a3的所有测量值,但不包括a4,a5和a6,因为它们未在两个阶段进行测量。

一个类似的问题是在7个月前提出的,但从未得到答案(Subset data frame to include only levels one factor that have values in both levels of another factor

谢谢!

3 个答案:

答案 0 :(得分:4)

使用group_by %>% filter,您可以使用library(dplyr) d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age)) # A tibble: 7 x 3 # Groups: ID [3] # ID age size # <fctr> <fctr> <dbl> #1 a1 juvenile -0.6947697 #2 a2 juvenile -0.3665272 #3 a3 juvenile 1.0293555 #4 a1 juvenile 0.2745224 #5 a2 adult 0.5299029 #6 a1 adult 2.2247802 #7 a3 adult -0.4717160

>>> 0 and False
0

答案 1 :(得分:4)

<{> split ageintersect和子集:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294

答案 2 :(得分:3)

以下是render(){ return( {(!this.props.data) ? (<View style={{flex:1}}> <Text>loading</Text> </View>) : (<View style={{flex:1, width:screenWidth}}> <Modal/> { /*your modal code */} </View>) } ); }

的一个选项
data.table

library(data.table) setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

base R选项
ave