我是R的新手,正在寻找一个代码来处理我手边的数百个文件。它们是包含几行不需要的文本的.txt文件,后跟数据列,如下所示:
XXXXX
XXXXX
XXXXX
Col1 Col2 Col3 Col4 Col5
1 36 37 35 36
2 34 34 36 37
.
.
1500 34 35 36 35
我编写了一个代码(下面)来提取单个.txt文件的第1列和第5列的选定行,并希望为我拥有的所有文件执行循环。
data <- read.table(paste("/Users/tan/Desktop/test/01.txt"), skip =264, nrows = 932)
selcol<-c("V1", "V5")
write.table(data[selcol], file="/Users/tan/Desktop/test/01ed.txt", sep="\t")
使用上面的代码,.txt文件现在看起来像这样:
Col1 Col5
300 34
.
.
700 34
如果可能,我想将.txt文件的所有Col5与第1列中的一个(所有txt文件都相同)组合在一起,这样看起来像这样:
Col1 Col5a Col5b Col5c Col5d ...
300 34 34 36 37
.
.
700 34 34 36 37
谢谢! 谈
答案 0 :(得分:5)
好吧 - 我想我在这里遇到了你所有的问题,但如果我错过了什么,请告诉我。我们将在这里完成的一般过程是:
lapply
迭代每个文件名以创建包含所有数据的单个列表对象出于示例的目的,请考虑我有四个名为file1.txt
到file4.txt
的文件,它们都是这样的:
x y y2
1 1 2.44281173 -2.32777987
2 2 -0.32999022 -0.60991623
3 3 0.74954561 0.03761497
4 4 -0.44374491 -1.65062852
5 5 0.79140012 0.40717932
6 6 -0.38517329 -0.64859906
7 7 0.92959219 -1.27056731
8 8 0.47004041 2.52418636
9 9 -0.73437337 0.47071120
10 10 0.48385902 1.37193941
##1. identify files to read in
filesToProcess <- dir(pattern = "file.*\\.txt$")
> filesToProcess
[1] "file1.txt" "file2.txt" "file3.txt" "file4.txt"
##2. Iterate over each of those file names with lapply
listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE))
##3. Select columns x and y2 from each of the objects in our list
listOfFiles <- lapply(listOfFiles, function(z) z[c("x", "y2")])
##NOTE: you can combine steps 2 and 3 by passing in the colClasses parameter to read.table.
#That code would be:
listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE
, colClasses = c("integer","NULL","numeric")))
##4. Merge all of the objects in the list together with Reduce.
# x is the common columns to join on
out <- Reduce(function(x,y) {merge(x,y, by = "x")}, listOfFiles)
#clean up the column names
colnames(out) <- c("x", sub("\\.txt", "", filesToProcess))
结果如下:
> out
x file1 file2 file3 file4
1 1 -2.32777987 -0.671934857 -2.32777987 -0.671934857
2 2 -0.60991623 -0.822505224 -0.60991623 -0.822505224
3 3 0.03761497 0.049694686 0.03761497 0.049694686
4 4 -1.65062852 -1.173863215 -1.65062852 -1.173863215
5 5 0.40717932 1.189763270 0.40717932 1.189763270
6 6 -0.64859906 0.610462808 -0.64859906 0.610462808
7 7 -1.27056731 0.928107752 -1.27056731 0.928107752
8 8 2.52418636 -0.856625895 2.52418636 -0.856625895
9 9 0.47071120 -1.290480033 0.47071120 -1.290480033
10 10 1.37193941 -0.235659079 1.37193941 -0.235659079