Question

我有600个制表符分隔的.txt文件，如下所示：

                       barcode gene.symbol    value
1 TCGA-61-2610-02A-01R-1141-07      15E1.2 -0.78175
2 TCGA-61-2610-02A-01R-1141-07      2'-PDE  -1.0155
3 TCGA-61-2610-02A-01R-1141-07         7A5    0.029
4 TCGA-61-2610-02A-01R-1141-07        A1BG  0.96575
5 TCGA-61-2610-02A-01R-1141-07       A2BP1   -0.301
6 TCGA-61-2610-02A-01R-1141-07         A2M -2.21575

我想将所有600个文件放在一个数据框中，这样gene.symbol将成为行名，值将与条形码的前12个字符组合成为列名。通过SO进行搜索我认为我有一个循环可以做到这一点，但需要注意一点。这就是我所拥有的（我还在学习R所以代码可能看起来很粗糙）：

n = 600
df <- read.delim(file=paste("agilent1.txt")
df.tmp <- data.frame()
colnames(df) = c("barcode", "gene.symbol", levels(df$barcode))
df = df[2 :3]

一旦我有第一个文件值的df，循环开始添加其他文件的值列（文件名为agilent1.txt，agilent2.txt等）：

for (i in 2:n) {
  df.tmp <- read.delim(file=paste("agilent", i, ".txt", sep="")
  a <- as.character(levels(df.tmp$barcode))
  a <- substr(a, 1, 12)
  df <- cbind(df, a = df.tmp$value)
}

一切正常但是在cbind命令中，a = df.tmp $ value使列名成为a（这是有道理的）但我希望a的值为列名。

  gene.symbol                 TCGA-61-2614                   a                  a                  a        a
1      15E1.2                      0.80475            -0.47375           -0.26825           -0.13425 -0.78175
2      2'-PDE                   -0.1348125          -0.1565625            0.19475         -0.3819375  -1.0155
3         7A5                       2.2735              2.4405              0.902              1.248    0.029
4        A1BG            0.817166666666667 -0.0471666666666667            -0.1005 -0.283333333333333  0.96575
5       A2BP1           -0.811333333333333   -1.02566666666667 -0.494833333333333             -0.948   -0.301
6         A2M                       -0.719            -1.00575           -1.07275              0.517 -2.21575

这听起来很容易，但我似乎无法找到答案。任何帮助将不胜感激。

干杯，

艾哈迈德

Answer 1

如果使用reshape包，则无需使用显式循环。这是一个两个班轮，它将完全符合您的要求（如果我理解正确的话）

require(plyr); require(reshape);
files = paste('agilent', 1:600, '.txt', sep = "") # create list of files
dfs   = ldply(files, read.delim)                  # read files into data frame
cast(dfs, gene ~ barcode)                         # reshape to required format

Answer 2

我建议你阅读600个数据文件，并把它们放在一起：

myfiles <- list.files()
mydat <- c()
for(i in 1:length(myfiles)) {
    temp <- read.table(myfiles[i], header=T)
    mydat <- rbind(mydat, temp)
}

library(reshape2)
newdat <- cast(mydat, gene.symbol ~ barcode, value=value)

如果你想让这些名字只有12个字符，你可以按照joran的反应

Answer 3

您总是可以在循环结束时单独设置列名：

df <- cbind(df, a = df.tmp$value)
colnames(df)[i+1] <- a

将来自多个data.frames的列与循环组合在一起

3 个答案: