我试图找出一种方法来根据R中的每个因素找到特定的值。换句话说,即使特定的行失败,我如何保持每个因子都满足某个条件的所有行?一个条件,但它的相同因素在另一行传递条件?
所以我有这样的事情:
gender values fruit
1 M 20 apple
2 M 22 pear
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango
11 F 7 apple
12 F 8 apple
我想要每一个水果并且至少有一个F性别(即使那个水果有一些M' s)。它也可能具有多种性别,例如中性(未示出)。所以我的理想输出是:
gender values fruit
1 M 20 apple
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
10 M 14 mango
11 F 7 apple
12 F 8 apple
请注意,香蕉和梨都缺失了,因为这些水果只有M和没有F。此外,第1行和第10行仍然存在,即使这些是M,因为还有其他苹果和芒果有F,它仍然适用。如果可能,请告诉我。谢谢!
以下是我复制此数据的代码:
gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)
这是我迄今为止所做的尝试:
df[duplicated(df[,c("fruit","gender")]),]
ave(df$gender, df$fruit, FUN=function(x) ifelse(x=='F','yes','no'))
另外,第三方图书馆受到欢迎,但我更喜欢留在R(包装统计数据和plyr都很好,因为我的系统上有这些。)
答案 0 :(得分:3)
df[df$fruit %in% unique(df[df$gender =='F', ]$fruit),]
# gender values fruit
#1 M 20 apple
#3 F 24 mango
#4 F 19 mango
#5 F 9 mango
#6 F 17 apple
#10 M 14 mango
#11 F 7 apple
#12 F 8 apple
答案 1 :(得分:3)
可能data.table
方法
library(data.table)
setDT(df)[, if(any(gender == "F")) .SD, by = fruit]
# fruit gender values
# 1: apple M 20
# 2: apple F 17
# 3: apple F 7
# 4: apple F 8
# 5: mango F 24
# 6: mango F 19
# 7: mango F 9
# 8: mango M 14
我喜欢其他方法,所以这里使用二进制连接data.table
等效
setkey(setDT(df), fruit)[.(unique(df[gender == "F", fruit], by = "fruit"))]
# gender values fruit
# 1: F 17 apple
# 2: F 7 apple
# 3: F 8 apple
# 4: M 20 apple
# 5: F 24 mango
# 6: F 19 mango
# 7: F 9 mango
# 8: M 14 mango
答案 2 :(得分:1)
基础r,data.table
,这里我提供dplyr
解决方案,即使某些输出不同(至少按结果顺序)。
library(dplyr)
df %>% group_by(fruit) %>% filter(any(gender == "F"))
Source: local data frame [8 x 3]
Groups: fruit
gender values fruit
1 M 20 apple
2 F 24 mango
3 F 19 mango
4 F 9 mango
5 F 17 apple
6 M 14 mango
7 F 7 apple
8 F 8 apple