R:在数据框中替换多个列名并保留其数值

时间:2017-08-06 18:56:55

标签: r dataframe string-formatting

我有一个名为dataOrder的data.frame,其中列对应于样本名称(n = 384),行对应于基因实体(n = 180200)。

                sample1  sample2   sample3   sample4   sample5   sample6
ENST00000000233       9        0   3499.51         0         0         0
ENST00000000412       0        0      0.00         0         0         0
ENST00000000442       0        0      0.00         0         0         0
ENST00000001008       0        0      0.00         0         0         0
ENST00000001146       0        0      0.00         0         0         0
ENST00000002125       0        0      0.00         0         0         0

我想将部分列名(str sample)替换为五个不同的名称:t1_,t2_,t3_,t4_和t5 _。

我尝试使用gsub函数替换名称:

nameVec <- names(dataOrder)
nameVec <- gsub("sample","t2_",nameVec[1:96])
nameVec <- gsub("sample","t3_",nameVec[97:163])
nameVec <- gsub("sample","t4_",nameVec[164:259])
nameVec <- gsub("sample","t5_",nameVec[260:333])
nameVec <- gsub("sample","t1_",nameVec[334:384])
names(dataOrder) <- nameVec
head(dataOrder)

但是,我的所有列名都被替换为NA。

如何在标题中替换'sample'字符串并将数字索引保留在列中?

                   t1_1    t1_96     t2_97    t2_163    t3_164    t3_259
ENST00000000233       9        0   3499.51         0         0         0
ENST00000000412       0        0      0.00         0         0         0
ENST00000000442       0        0      0.00         0         0         0
ENST00000001008       0        0      0.00         0         0         0
ENST00000001146       0        0      0.00         0         0         0
ENST00000002125       0        0      0.00         0         0         0

这是可重复的数据示例(由@RuiBarradas编写):

mydf <-
structure(list(target_id = c("ENST00000000233", "ENST00000000412", 
"ENST00000000442", "ENST00000001008", "ENST00000001146", "ENST00000002125"
), sample1 = c(9L, 0L, 0L, 0L, 0L, 0L), sample10 = c(0L, 0L, 
0L, 0L, 0L, 0L), sample100 = c(3499.51, 0, 0, 0, 0, 0), sample101 = c(0L, 
0L, 0L, 0L, 0L, 0L), sample102 = c(0L, 0L, 0L, 0L, 0L, 0L), sample103 = c(0L, 
0L, 0L, 0L, 0L, 0L)), .Names = c("target_id", "sample1", "sample10", 
"sample100", "sample101", "sample102", "sample103"), class = "data.frame", row.names = c("1:", 
"2:", "3:", "4:", "5:", "6:"))

result <- mydf[-1]
row.names(result) <- mydf$target_id
result

谢谢!

1 个答案:

答案 0 :(得分:3)

您只用所有向量代替它。尝试改为

nameVec <- names(dataOrder)
nameVec[1:96] <- gsub("sample", "t2_", nameVec[1:96])
nameVec[97:163] <- gsub("sample", "t3_", nameVec[97:163])
nameVec[164:259] <- gsub("sample", "t4_", nameVec[164:259])
nameVec[260:333] <- gsub("sample", "t5_", nameVec[260:333])
nameVec[334:384] <- gsub("sample", "t1_", nameVec[334:384])
names(dataOrder) <- nameVec