特定替代的模式

时间:2013-10-31 09:54:33

标签: r gsub

我有这个对象:

nr.genes<-structure(c("0", "Pipas_chr3_1145", "Pipas_chr3_1145", "Pipas_chr1-4_0581", 
"Pipas_chr1-4_0582", "Pipas_chr1-4_0584", "Pipas_chr2-1_0006", 
"Pipas_chr2-2_0010", "Pipas_chr3_0002", "0", "0", "Pipas_c034_0013", 
"Pipas_chr1-4_0582", "Pipas_chr3_0002", "Pipas_chr4_0878", "Pipas_chr4_1001", 
"Pipas_chr4_0878", "Pipas_chr4_1001", "0", "Pipas_chr1-4_0581", 
"Pipas_chr1-4_0582", "Pipas_chr2-2_0010", "Pipas_chr3_0002", 
"Pipas_chr4_0878", "Pipas_chr4_1001", "0", "Pipas_c131_0003", 
"Pipas_chr1-1_0281", "Pipas_chr1-3_0004", "Pipas_chr3_1145", 
"Pipas_chr4_0003", "Pipas_c034_0013", "Pipas_c131_0003", "Pipas_chr1-3_0004", 
"Pipas_chr3_1145", "Pipas_chr4_0003", "0"), .Names = c("Nr11", 
"Nr12", "Nr13", "Nr141", "Nr142", "Nr143", "Nr144", "Nr145", 
"Nr146", "Nr21", "Nr22", "Nr23", "Nr241", "Nr242", "Nr311", "Nr312", 
"Nr321", "Nr322", "Nr33", "Nr341", "Nr342", "Nr343", "Nr344", 
"Nr345", "Nr346", "Nr41", "Nr421", "Nr422", "Nr423", "Nr424", 
"Nr425", "Nr431", "Nr432", "Nr433", "Nr434", "Nr435", "Nr44"))

我想用这些更改名称:

nr.genes.names<-c("up.p33-dw.p33", "up.p33-dw.p38", "up.p33-dw.p52", "up.p33-dw.p64", 
"up.p38-dw.p33", "up.p38-dw.p38", "up.p38-dw.p52", "up.p38-dw.p64", 
"up.p52-dw.p33", "up.p52-dw.p38", "up.p52-dw.p52", "up.p52-dw.p64", 
"up.p64-dw.p33", "up.p64-dw.p38", "up.p64-dw.p52", "up.p64-dw.p64"
)

所以nr.genes的最终结果应该是

structure(c("0", "Pipas_chr3_1145", "Pipas_chr3_1145", "Pipas_chr1-4_0581", 
"Pipas_chr1-4_0582", "Pipas_chr1-4_0584", "Pipas_chr2-1_0006", 
"Pipas_chr2-2_0010", "Pipas_chr3_0002", "0", "0", "Pipas_c034_0013", 
"Pipas_chr1-4_0582", "Pipas_chr3_0002", "Pipas_chr4_0878", "Pipas_chr4_1001", 
"Pipas_chr4_0878", "Pipas_chr4_1001", "0", "Pipas_chr1-4_0581", 
"Pipas_chr1-4_0582", "Pipas_chr2-2_0010", "Pipas_chr3_0002", 
"Pipas_chr4_0878", "Pipas_chr4_1001", "0", "Pipas_c131_0003", 
"Pipas_chr1-1_0281", "Pipas_chr1-3_0004", "Pipas_chr3_1145", 
"Pipas_chr4_0003", "Pipas_c034_0013", "Pipas_c131_0003", "Pipas_chr1-3_0004", 
"Pipas_chr3_1145", "Pipas_chr4_0003", "0"), .Names = c("up.p33-dw.p33", 
"up.p33-dw.p38", "up.p33-dw.p52", "up.p33-dw.p64", "up.p33-dw.p64", 
"up.p33-dw.p64", "up.p33-dw.p64", "up.p33-dw.p64", "up.p33-dw.p64", 
"up.p38-dw.p33", "up.p38-dw.p38", "up.p38-dw.p52", "up.p38-dw.p64", 
"up.p38-dw.p64", "up.p52-dw.p33", "up.p52-dw.p33", "up.p52-dw.p38", 
"up.p52-dw.p38", "up.p52-dw.p52", "up.p52-dw.p64", "up.p52-dw.p64", 
"up.p52-dw.p64", "up.p52-dw.p64", "up.p52-dw.p64", "up.p52-dw.p64", 
"up.p64-dw.p33", "up.p64-dw.p38", "up.p64-dw.p38", "up.p64-dw.p38", 
"up.p64-dw.p38", "up.p64-dw.p38", "up.p64-dw.p52", "up.p64-dw.p52", 
"up.p64-dw.p52", "up.p64-dw.p52", "up.p64-dw.p52", "up.p64-dw.p64"
))

但是,如果我尝试使用这些表达式,我就不会这样做: attr(nr.genes, "names") <- nr.genes.names[as.integer(gsub('^Nr[0-9]([0-9]).*','\\1',attr(nr.genes, "names")))]

我认为问题出现在模式中或替换中,因为我只是取代前4个名字,然后重复这些名称。我刚看到chartr(),我觉得它没有用。我怎样才能改进gsub

编辑:我想从

更改
Nr11 by up.p33-dw.p33
Nr12 by up.p33-dw.p38
Nr13 by up.p33-dw.p52
Nr14 by up.p33-dw.p64
Nr21 by up.p38-dw.p33
Nr22 by up.p38-dw.p38
Nr23 by up.p38-dw.p52
Nr24 by up.p38-dw.p64
Nr31 by up.p52-dw.p33
Nr32 by up.p52-dw.p38
Nr33 by up.p52-dw.p52
Nr34 by up.p52-dw.p64
Nr41 by up.p64-dw.p33
Nr42 by up.p64-dw.p38
Nr43 by up.p64-dw.p52
Nr44 by up.p64-dw.p64

1 个答案:

答案 0 :(得分:2)

这里不需要regexp。您可以strtrim将您的名字match发送给重要字符,然后像这样使用# Get a vector of the unique elements of the bits of the old names we want to match against idx <- sort( unique( strtrim(names(nr.genes),4) ) ) # [1] "Nr11" "Nr12" "Nr13" "Nr14" "Nr21" "Nr22" "Nr23" "Nr24" "Nr31" "Nr32" "Nr33" "Nr34" "Nr41" "Nr42" #[15] "Nr43" "Nr44" # Match the old names to their position in the 'idx' vector, # and use this as an index vector to select the new names # from 'nr.genes.names'. # This works because 'nr.gene.names' is sorted the same way # as 'idx', i.e. in ascending order. new <- nr.genes.names[ match( strtrim(names(nr.genes),4) , idx ) ] # assign new names names( nr.genes ) <- new ......

{{1}}