在R中将两个列值粘贴在一起时省略NA值

时间:2015-12-15 04:18:00

标签: r

我有一个名为dd2的数据框。我需要将Left.Gene.SymbolsRight.Gene.Symbols中的值粘贴到我只需使用下面的代码即可,但如果缺少值,我不希望粘贴NAs。我希望它看起来像combination列,如result所示。

mycode的

#to remove NAs
dd2[dd2 == 'NA'] <- NA
#pasting values together
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*"))

数据

dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP", 
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT", 
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id", 
"Left.Gene.Symbols", "Right.Gene.Symbols")))

结果

   customer_sample_id Left.Gene.Symbols Right.Gene.Symbols  combination
[1,] "AMLM12001KP"      "AK2"             NA                    AK2*
[2,] "AMLM12001KP"      "HFM1"           "PPT"                  HFM1*PPT
[3,] "AMLM12001KP"      "HFM1"            NA                    HFM1*
[4,] "AMLM12001KP"      "HFM1"           "GGT"                  HFM1*GGT
[5,] "AMLM12001KP"      "HFM1"            NA                    HFM1* 

3 个答案:

答案 0 :(得分:4)

您可以执行以下操作,暂时使用空字符NA替换""值。

cbind(
    dd2, 
    combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*")
)
#      customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations
# [1,] "AMLM12001KP"      "AK2"             NA                 "AK2*"      
# [2,] "AMLM12001KP"      "HFM1"            "PPT"              "HFM1*PPT"  
# [3,] "AMLM12001KP"      "HFM1"            NA                 "HFM1*"     
# [4,] "AMLM12001KP"      "HFM1"            "GGT"              "HFM1*GGT"  
# [5,] "AMLM12001KP"      "HFM1"            NA                 "HFM1*"    

当然,请将列名替换为上面的列号。我没有写它们,因为它们太长了。

答案 1 :(得分:3)

使用ifelse

的一种方法
ifelse(is.na(dd2[,3]),paste0(dd2[,2],"*"),paste(dd2[,2],dd2[,3],sep="*"))

#[1] "AK2*"     "HFM1*PPT" "HFM1*"    "HFM1*GGT" "HFM1*"

答案 2 :(得分:3)

我们可以使用NAer中的qdapsprintf

library(qdap)
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],''))
#[1] "AK2*"     "HFM1*PPT" "HFM1*"    "HFM1*GGT" "HFM1*"