R - 根据组因子查找行

时间:2015-06-30 14:58:10

标签: r

我试图找出一种方法来根据R中的每个因素找到特定的值。换句话说,即使特定的行失败,我如何保持每个因子都满足某个条件的所有行?一个条件,但它的相同因素在另一行传递条件?

所以我有这样的事情:

   gender values  fruit
1       M     20  apple
2       M     22   pear
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango
11      F      7  apple
12      F      8  apple

我想要每一个水果并且至少有一个F性别(即使那个水果有一些M' s)。它也可能具有多种性别,例如中性(未示出)。所以我的理想输出是:

   gender values  fruit
1       M     20  apple
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
10      M     14  mango
11      F      7  apple
12      F      8  apple

请注意,香蕉和梨都缺失了,因为这些水果只有M和没有F。此外,第1行和第10行仍然存在,即使这些是M,因为还有其他苹果和芒果有F,它仍然适用。如果可能,请告诉我。谢谢!

以下是我复制此数据的代码:

gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)

这是我迄今为止所做的尝试:

df[duplicated(df[,c("fruit","gender")]),]
ave(df$gender, df$fruit, FUN=function(x) ifelse(x=='F','yes','no'))

另外,第三方图书馆受到欢迎,但我更喜欢留在R(包装统计数据和plyr都很好,因为我的系统上有这些。)

3 个答案:

答案 0 :(得分:3)

df[df$fruit %in% unique(df[df$gender =='F', ]$fruit),]
#   gender values fruit
#1       M     20 apple
#3       F     24 mango
#4       F     19 mango
#5       F      9 mango
#6       F     17 apple
#10      M     14 mango
#11      F      7 apple
#12      F      8 apple

答案 1 :(得分:3)

可能data.table方法

library(data.table)
setDT(df)[, if(any(gender == "F")) .SD, by = fruit]
#    fruit gender values
# 1: apple      M     20
# 2: apple      F     17
# 3: apple      F      7
# 4: apple      F      8
# 5: mango      F     24
# 6: mango      F     19
# 7: mango      F      9
# 8: mango      M     14

我喜欢其他方法,所以这里使用二进制连接data.table等效

setkey(setDT(df), fruit)[.(unique(df[gender == "F", fruit], by = "fruit"))]
#    gender values fruit
# 1:      F     17 apple
# 2:      F      7 apple
# 3:      F      8 apple
# 4:      M     20 apple
# 5:      F     24 mango
# 6:      F     19 mango
# 7:      F      9 mango
# 8:      M     14 mango

答案 2 :(得分:1)

基础r,data.table,这里我提供dplyr解决方案,即使某些输出不同(至少按结果顺序)。

library(dplyr)
df %>% group_by(fruit) %>% filter(any(gender == "F"))
Source: local data frame [8 x 3]
Groups: fruit

  gender values fruit
1      M     20 apple
2      F     24 mango
3      F     19 mango
4      F      9 mango
5      F     17 apple
6      M     14 mango
7      F      7 apple
8      F      8 apple