根据条件合并

时间:2018-07-12 00:19:54

标签: r database for-loop bioinformatics

我有两个数据库:

dfgenus<- c("Coragyps" ,"Elanus", "Elanus", "Patagioenas", "Crotophaga") 

如此

dfgenus
       Genus
1     Coragyps
2       Elanus
3       Elanus
4  Patagioenas
5   Crotophaga

family <-c("Cathartidae", "Accipitridae","Cuculidae", "Columbidae","Psittacidae")
Genus <- c("Coragyps" ,"Elanus", "Crotophaga", "Patagioenas", "Pyrrhura")

sacc<- data.frame(family, genus)
##Sacc db rows are in the right order (the genus belongs to its taxonomic family)

sacc
  family       Genus
1  Cathartidae    Coragyps
2 Accipitridae      Elanus
3    Cuculidae  Crotophaga
4   Columbidae Patagioenas
5  Psittacidae    Pyrrhura

在有关“ sacc”的信息之后,如何为“ dbgenus”中的每个属添加正确的家族?

我一直未尝试:

for (i in length(dfgenus)){
if (identical(sacc[i], dfgenus[i])) {
    df$family[i] <- sacc$family[i] 
}   else {
        i-1 
} 
print(df$family) 
}

输出应为:

df
        family       Genus
1  Cathartidae    Coragyps
2 Accipitridae      Elanus
3 Accipitridae      Elanus
4   Columbidae Patagioenas
5    Cuculidae  Crotophaga

2 个答案:

答案 0 :(得分:1)

使用dplyr解决方案:

library(dplyr)

dbgenus<- data.frame(genus = c("Coragyps" ,"Elanus", "Elanus", "Patagioenas", "Crotophaga"))
family <-c("Cathartidae", "Accipitridae","Cuculidae", "Columbidae","Psittacidae")
genus <- c("Coragyps" ,"Elanus", "Crotophaga", "Patagioenas", "Pyrrhura")

sacc<- data.frame(family, genus)

dbgenus %>% left_join(sacc)

答案 1 :(得分:1)

有几种方法可以实现您的结果。它们都不应该涉及for循环:)

如果将dfgenus做成一个数据帧(只有一列),则可以研究merge()函数或dplyr包中的联接函数。

但是使用现有数据,您可以使用match()

newdf <- data.frame(Genus  = dfgenus, 
                    Family = sacc[match(dfgenus, sacc$Genus), "family"])

        Genus       Family
1    Coragyps  Cathartidae
2      Elanus Accipitridae
3      Elanus Accipitridae
4 Patagioenas   Columbidae
5  Crotophaga    Cuculidae

matchsacc返回匹配的行号,然后将其用于从family列返回子集。