我的输入文件有8列。我有38个文件想要合并在一起。 输入文件:AAA.out
pos gpos p1 ihh1 p2 ihh2 xpehh
9.1022217 1022217 1.02222e+06 0.138333 901220 0.0738636 572286 0.454111
9.1024910 1024910 1.02491e+06 0.138333 900853 0.0738636 572286 0.453703
9.1041353 1041353 1.04135e+06 0.246667 852186 0.0738636 573584 0.3959
9.1070162 1070162 1.07016e+06 0.113333 870718 0 583622 0.400065
BBB.out
pos gpos p1 ihh1 p2 ihh2 xpehh
8.1135641 1135641 1.13564e+06 0.368333 639953 0.352273 512804 0.2215
8.1152035 1152035 1.15204e+06 0.00333333 651548 0 540213 0.187389
8.1158202 1158202 1.1582e+06 0.358333 646188 0 540213 0.179129
8.1178735 1178735 1.17874e+06 0.01 654438 0.409091 486335 0.29688
8.1193344 1193344 1.19334e+06 0 651573 0 497049 0.270699
8.1230464 1230464 1.23046e+06 0.373333 631599 0.505682 482294 0.269701
我尝试通过
合并它们files <- list.files(pattern = "*.*.out", full.names = TRUE, recursive = FALSE)
#make a list of all out.files
uridata <- data.frame()
#go through each file, one by one, and add it to the 'uridata' df, above
big_list_of_data_frames <- lapply(files, read.table, skip = FALSE,header = TRUE, stringsAsFactors = FALSE)
big_data_frame <- do.call(rbind,big_list_of_data_frames)
new_fram <- big_data_frame [,c(1,7)]
the dput:
structure(list(pos = c(1022217L, 1024910L, 1041353L, 1070162L,
1089884L), gpos = c(1022220, 1024910, 1041350, 1070160, 1089880
), p1 = c(0.138333, 0.138333, 0.246667, 0.113333, 0.113333),
ihh1 = c(901220L, 900853L, 852186L, 870718L, 870014L), p2 = c(0.0738636,
0.0738636, 0.0738636, 0, 0), ihh2 = c(572286L, 572286L, 573584L,
583622L, 583435L), xpehh = c(0.454111, 0.453703, 0.3959,
0.400065, 0.399577)), class = "data.frame", row.names = c("9.1022217",
"9.1024910", "9.1041353", "9.1070162", "9.1089884"))
我希望我的输出文件在csv中
ID XPEHH
9.1022217 0.454111
9.1024910 0.453703
9.1041353 0.3959
.
.
.
8.1135641 0.2215
但是,我不知道为什么输入文件中的第一列将成为big_data_fram中的第0列?
您能提供任何建议吗?
答案 0 :(得分:1)
您在合并文件的方法上做得很好。您的问题在于如何使用read.table
读取文件,因为read.table
假设如果缺少第一列名称,则第一列就是行名。看到这里:
> read.table(text=BBB, header=TRUE)
pos gpos p1 ihh1 p2 ihh2 xpehh
8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, header=TRUE))
[1] "8.1135641" "8.1152035" "8.1158202" "8.1178735" "8.1193344" "8.1230464"
嗯,请看?read.table
关于row.names
参数的内容。 TLDR;通过将其设置为NULL来禁用它。
> read.table(text=BBB, row.names = NULL, header=TRUE)
row.names pos gpos p1 ihh1 p2 ihh2 xpehh
1 8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
2 8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
3 8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
4 8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
5 8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
6 8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, row.names = NULL, header=TRUE))
[1] "1" "2" "3" "4" "5" "6"
您可以在此处看到第一列的名称方便地命名为“ row.names”。如果列名是预先固定的,则只需提供带有col.names
参数的名称向量即可为第一列指定名称。
对于这些示例,我已经使用BBB
参数从变量text
中的字符串中读取了文件的内容;您将不得不用file参数和文件名代替它。