避免在多列比较中使用嵌套循环

时间:2019-01-31 15:10:30

标签: r

我有一个这样的数据框:

df <- data.frame(Patient.ID = rep(paste("Pat", seq(1:3), sep = ""), 2),
             Gene = c(rep("Gene1", 3), rep("Gene2", 3)),
             Ref = c("A", "C", "G", "T", "A", "T"),
             Tum1 = c("A", "A", "T", "T", "A", "T"),
             Tum2 = c("A", "C", "G", "G", "C", "C"))

我想做的是确定Ref或Tum列之间发生的更改。换句话说,如果Tum1与Tum2不同,则使用与Ref列不同的字符串并将其存储在单独的列中作为更改,这样上面的数据帧将变为:

df <- data.frame(Patient.ID = rep(paste("Pat", seq(1:3), sep = ""), 2),
             Gene = c(rep("Gene1", 3), rep("Gene2", 3)),
             Ref = c("A", "C", "G", "T", "A", "T"),
             Tum1 = c("A", "A", "T", "T", "A", "T"),
             Tum2 = c("A", "C", "G", "G", "C", "C"),
             BaseChange = c("NoCh", "C.A", "G.T", "T.G", "A.C", "T.C"))

我知道我可以使用如下所示的嵌套ifelse()语句(但进行扩展)来解决此问题,但是我的实际数据帧具有更多组合,因此我认为这样做必须有“更安全”的方法。

df$BaseChange <- as.factor(ifelse(df$Ref == "C" & df$Tum1 == "A" | df$Ref== "C" & df$Tum2 == "A", "C.A",
                              ifelse((df$Ref == "G" & df$Tum1 == "T" | df$Ref == "G" & df$Tum2 == "T"), "G.T",...)))

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

它不漂亮,但是可以工作:

df <- df %>%
  mutate(BaseChange2 = ifelse( (as.character(Ref)==as.character(Tum1) & as.character(Ref) == as.character(Tum2)), "NoCh",
                                         ifelse(as.character(Ref)==as.character(Tum1),paste(Ref,Tum2, sep="."),paste(Ref,Tum1, sep="."))))

答案 1 :(得分:0)

似乎您需要将唯一的Tum粘贴在一起,即

apply(df[3:5], 1, function(i) paste0(unique(i), collapse = '.'))
#[1] "A"   "C.A" "G.T" "T.G" "A.C" "T.C" 

要替换第一个A

v2 <- apply(df[3:5], 1, function(i) paste0(unique(i), collapse = '.'))
replace(v2, nchar(v2) == 1, 'NoChange')
#[1] "NoChange" "C.A"      "G.T"      "T.G"      "A.C"      "T.C"