我想通过合并文件夹中的所有文件来创建data.frame。文件夹中的每个文件都采用这种格式。
sample.1 =
gene_id normalized_count
ABCB7|22 536.0631
ABCB8|11194 504.5299
ABCB9|23457 147.6550
ABCC10|89845 458.8775
ABCC11|85320 5.6477
sample.n =
gene_id normalized_count
ABCB7|22 122.3673
ABCB8|11194 849.9824
ABCB9|23457 169.9023
ABCC10|89845 0.0000
ABCC11|85320 2.8239
While creating new data.frame, have to paste new column with normalized_count if the gene_id are same. The new column ID should be the name of the file
desired output =
gene_id sample.1 sample.n
ABCB7|22 536.0631 122.3673
ABCB8|11194 504.5299 849.9824
ABCB9|23457 147.6550 169.9023
ABCC10|89845 458.8775 0.0000
ABCC11|85320 5.6477 2.8239
我试过这个来创建一个新的data.frame。
file_list <- list.files("./")
dataset <- do.call("cbind",lapply(file_list,FUN=function(files{
read.table(files,header=TRUE, sep="\t")}))
答案 0 :(得分:2)
我从你的例子
中获取了一些“.txt”文件file_list <- list.files("./")[15:16]
> file_list
[1] "sample.1.txt" "sample.n.txt"
然后:
dataset <- Reduce(function(x, y) merge(x, y, by="gene_id"),
lapply(file_list,FUN=function(files){
read.table(files,header=TRUE, sep="")
}))
names(dataset)[-1] <- gsub("[.]txt", "", file_list)
> dataset
gene_id sample.1 sample.n
1 ABCB7|22 536.0631 122.3673
2 ABCB8|11194 504.5299 849.9824
3 ABCB9|23457 147.6550 169.9023
4 ABCC10|89845 458.8775 0.0000
5 ABCC11|85320 5.6477 2.8239
使用