我有一个像这样的数据集:
df<-data.frame(x=c("A","A","A","A", "B","B","B","B","B",
"C","C","C","C","C","D","D","D","D","D"),
y= as.factor(c(rep("Eoissp2",4),rep("Eoissp1",5),"Eoissp1","Eoisp4","Automerissp1","Automerissp2","Acharias",rep("Eoissp2",3),rep("Eoissp1",2))))
我想为x
的每个子集确定y
中包含表达式Eois
的完全重复的相应级别。因此,A
,B
和D
将在向量中返回,因为A
,B
和D
的每个级别都包含表达式Eois
,而等级C
由各种独特的等级组成(例如Eois,Automeris和Acharias)。对于此示例,输出将为:
output<- c("A", "B", "D")
答案 0 :(得分:0)
使用新的df:
> df %>% filter(str_detect(y,"Eois")) %>% group_by(x) %>% distinct(y) %>%
count() %>% filter(n==1) %>% select(x)
# A tibble: 2 x 1
# Groups: x [2]
x
<fct>
1 A
2 B
(以下答案使用问题作者发布的原始df。)
使用magrittr
&amp;中的管道功能来自dplyr
的函数:
> df %>% group_by(x) %>% distinct(y)
# A tibble: 7 x 2
# Groups: x [3]
x y
<fct> <fct>
1 A plant1a
2 B plant1b
3 C plant1a
4 C plant2a
5 C plant3a
6 C plant4a
7 C plant5a
然后你可以像这样汇总结果:
> results <- df %>% group_by(x) %>% distinct(y) %>%
count() %>% filter(n==1) %>% select(x)
> results
# A tibble: 2 x 1
# Groups: x [2]
x
<fct>
1 A
2 B
如果您知道原始数据框始终按顺序附带x,则可以删除group_by
部分。
答案 1 :(得分:0)
基于dplyr
的解决方案可以是:
library(dplyr)
df %>% group_by(x) %>%
filter(grepl("Eoiss", y)) %>%
mutate(y = sub("\\d+", "", y)) %>%
filter(n() >1 & length(unique(y)) == 1) %>%
select(x) %>% unique(.)
# A tibble: 3 x 1
# Groups: x [3]
# x
# <fctr>
#1 A
#2 B
#3 D
数据强>
df<-data.frame(x=c("A","A","A","A", "B","B","B","B","B",
"C","C","C","C","C","D","D","D","D","D"),
y= as.factor(c(rep("Eoissp2",4),
rep("Eoissp1",5),"Eoissp1","Eoisp4","Automerissp1","Automerissp2",
"Acharias",rep("Eoissp2",3),rep("Eoissp1",2))))