比较多列中的字符(因子)

时间:2019-10-28 16:04:37

标签: r dataframe tidyverse

我有一个数据框,其中包含来自多个数据库的具有不同定位注释的蛋白质。我想将每种蛋白质(行)的那些条目进行比较,如果结果一致,则将结果与原始条目写入新的col中;如果cols中的条目不同,则将结果写入“不同意”。

我想答案有一个简单的解决方案,但我还没有找到答案,非常感谢您的帮助! 如果可能的话,我希望得到一个tidyverse解决方案:)

谢谢!

塞巴斯蒂安

start_df <- data.frame(protein = c("A", "B", "C", "D"),
             location_1 = c("membrane", "membrane", "nucleus", "mito"),
             location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
             location_3 = c("membrane", "membrane", "nucleus", "membrane"),
             location_4 = c("membrane", "membrane", "nucleus", "mito"))

expectation <- data.frame(protein = c("A", "B", "C", "D"),
             location_1 = c("membrane", "membrane", "nucleus", "mito"),
             location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
             location_3 = c("membrane", "membrane", "nucleus", "membrane"),
             location_4 = c("membrane", "membrane", "nucleus", "mito"),
             location_all = c("membrane", "disagrement", "nucleus", "disagrement"))

1 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

library(tidyverse)

start_df <- data.table::data.table(protein = c("A", "B", "C", "D"),
             location_1 = c("membrane", "membrane", "nucleus", "mito"),
             location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
             location_3 = c("membrane", "membrane", "nucleus", "membrane"),
             location_4 = c("membrane", "membrane", "nucleus", "mito"))

df <- data.table::as.data.table(t(x = start_df)) # transpose the dataframe for comparison in the apply statement
colnames(df) <- as.character(df[1,])
df1 <- df[-1,] # take the colnames out for the comparison between the cellular compartments you want to compare
ls <- apply(X = df1,MARGIN = 2, FUN = unique) # take unique - so that you have only one element if they all "agree" and only one location is there
start_df$unique <- ls # add this to your initial dataframe
res <- start_df %>% mutate(location_all =if_else( condition = grepl(",",start_df$unique), true = "disagrement", false = location_1 )) # write a new column with your desired outcome

如果字符串略有不同,您可能会遇到问题,例如:“ membrane”和“ Membrane”等。

希望它能帮助您并祝您一切顺利