我有一个数据框,其中包含来自多个数据库的具有不同定位注释的蛋白质。我想将每种蛋白质(行)的那些条目进行比较,如果结果一致,则将结果与原始条目写入新的col中;如果cols中的条目不同,则将结果写入“不同意”。
我想答案有一个简单的解决方案,但我还没有找到答案,非常感谢您的帮助! 如果可能的话,我希望得到一个tidyverse解决方案:)
谢谢!
塞巴斯蒂安
start_df <- data.frame(protein = c("A", "B", "C", "D"),
location_1 = c("membrane", "membrane", "nucleus", "mito"),
location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
location_3 = c("membrane", "membrane", "nucleus", "membrane"),
location_4 = c("membrane", "membrane", "nucleus", "mito"))
expectation <- data.frame(protein = c("A", "B", "C", "D"),
location_1 = c("membrane", "membrane", "nucleus", "mito"),
location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
location_3 = c("membrane", "membrane", "nucleus", "membrane"),
location_4 = c("membrane", "membrane", "nucleus", "mito"),
location_all = c("membrane", "disagrement", "nucleus", "disagrement"))
答案 0 :(得分:1)
您可以尝试以下方法:
library(tidyverse)
start_df <- data.table::data.table(protein = c("A", "B", "C", "D"),
location_1 = c("membrane", "membrane", "nucleus", "mito"),
location_2 = c("membrane", "nucleus", "nucleus", "membrane"),
location_3 = c("membrane", "membrane", "nucleus", "membrane"),
location_4 = c("membrane", "membrane", "nucleus", "mito"))
df <- data.table::as.data.table(t(x = start_df)) # transpose the dataframe for comparison in the apply statement
colnames(df) <- as.character(df[1,])
df1 <- df[-1,] # take the colnames out for the comparison between the cellular compartments you want to compare
ls <- apply(X = df1,MARGIN = 2, FUN = unique) # take unique - so that you have only one element if they all "agree" and only one location is there
start_df$unique <- ls # add this to your initial dataframe
res <- start_df %>% mutate(location_all =if_else( condition = grepl(",",start_df$unique), true = "disagrement", false = location_1 )) # write a new column with your desired outcome
如果字符串略有不同,您可能会遇到问题,例如:“ membrane”和“ Membrane”等。
希望它能帮助您并祝您一切顺利