从索引列替换数据框中的值

时间:2017-08-22 18:04:21

标签: r

我有一个如下所示的数据矩阵:

> taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = 10, ncol = 7)
> rownames(taxmat) <- paste0("OTU", 1:nrow(taxmat))
> taxmat<-cbind(taxmat,c("Genus","Genus","Genus","Family","Family","Order","Genus","Species","Genus","Species"))
> colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Lowest")
> taxmat
      Domain Phylum Class Order Family Genus Species Lowest   
OTU1  "h"    "c"    "q"   "e"   "q"    "w"   "v"     "Genus"  
OTU2  "f"    "y"    "q"   "z"   "p"    "w"   "v"     "Genus"  
OTU3  "w"    "q"    "i"   "i"   "z"    "j"   "f"     "Genus"  
OTU4  "c"    "e"    "f"   "n"   "z"    "b"   "d"     "Family" 
OTU5  "g"    "w"    "q"   "k"   "e"    "x"   "k"     "Family" 
OTU6  "x"    "j"    "l"   "w"   "z"    "o"   "q"     "Order"  
OTU7  "k"    "s"    "j"   "y"   "t"    "a"   "t"     "Genus"  
OTU8  "w"    "u"    "s"   "w"   "g"    "y"   "n"     "Species"
OTU9  "t"    "r"    "t"   "o"   "i"    "l"   "z"     "Genus"  
OTU10 "x"    "p"    "j"   "f"   "k"    "q"   "w"     "Species"

专栏&#34;最低&#34;告诉我最低排名我对该行的数据有信心。对于每一行,我想替换&#34;最低&#34;所指示的列之后的列中的值。 &#34;未知。&#34;

此示例的预期输出为:

       Domain Phylum Class Order Family   Genus     Species       Lowest
 OTU1  "b"    "b"    "v"   "v"   "l"      "n"       "unknown"     "Genus"
 OTU2  "l"    "m"    "w"   "b"   "f"      "y"       "unknown"     "Genus"
 OTU3  "h"    "w"    "n"   "y"   "k"      "f"       "unknown"     "Genus"
 OTU4  "u"    "m"    "p"   "n"   "t"      "unknown" "unknown"     "Family"
 OTU5  "o"    "b"    "q"   "w"   "a"      "unknown" "unknown"     "Family"
 OTU6  "s"    "j"    "l"   "d"   "unknown""unknown" "unknown"     "Order"
 OTU7  "v"    "y"    "t"   "p"   "s"      "v"       "unknown"     "Genus"
 OTU8  "b"    "r"    "k"   "d"   "q"      "c"       "q"           "Species"
 OTU9  "k"    "h"    "b"   "w"   "h"      "x"       "unknown"     "Genus"
 OTU10 "o"    "p"    "b"   "n"   "k"      "d"       "q"           "Species"

我可以使用

将所有索引替换为矢量
idx<-lapply(tax$Lowest, grep, colnames(tax))
idx <- as.numeric(unlist(idx))+1

但我不确定如何更换这些值。谢谢你的帮助!

1 个答案:

答案 0 :(得分:1)

我们可以在apply的行中使用循环,并通过matchnames创建一个逻辑索引,使用最后一个元素的列,即&#39;中的元素。最低&#39;将replace行的值设置为&#39; unknown&#39;

t(apply(m1, 1, function(x) {
         i1 <- match( x[8], names(x)[-8])+1
         i1[i1>7] <- 0
         i1 <- if(i1!=0) i1:7 else i1
        c(replace(x[-8], i1, "unknown"), x[8])}))
#      Domain Phylum Class Order Family    Genus     Species   Lowest   
#OTU1  "b"    "b"    "v"   "v"   "l"       "n"       "unknown" "Genus"  
#OTU2  "l"    "m"    "w"   "b"   "f"       "y"       "unknown" "Genus"  
#OTU3  "h"    "w"    "n"   "y"   "k"       "f"       "unknown" "Genus"  
#OTU4  "u"    "m"    "p"   "n"   "t"       "unknown" "unknown" "Family" 
#OTU5  "o"    "b"    "q"   "w"   "a"       "unknown" "unknown" "Family" 
#OTU6  "s"    "j"    "l"   "d"   "unknown" "unknown" "unknown" "Order"  
#OTU7  "v"    "y"    "t"   "p"   "s"       "v"       "unknown" "Genus"  
#OTU8  "b"    "r"    "k"   "d"   "q"       "c"       "q"       "Species"
#OTU9  "k"    "h"    "b"   "w"   "h"       "x"       "unknown" "Genus"  
#OTU10 "o"    "p"    "b"   "n"   "k"       "d"       "q"       "Species"

或另一种选择是根据match列名称创建行/列索引,最后一列是&#39; m1&#39;和行的顺序,然后cbind索引,并在&#39; m1&#39;中分配值。到了&#39;未知&#39;

lst <- Map(function(x, y) if(x >y) 0 else x:y, match(m1[,8], colnames(m1)[-8])+1, 7)
m1[cbind(rep(seq_len(nrow(m1)), lengths(lst)), unlist(lst))] <- "unknown"