Question

我的输入文件有8列。我有38个文件想要合并在一起。输入文件：AAA.out

             pos     gpos         p1        ihh1        p2        ihh2  xpehh
 9.1022217  1022217 1.02222e+06 0.138333    901220  0.0738636   572286  0.454111  

 9.1024910  1024910 1.02491e+06 0.138333    900853  0.0738636   572286  0.453703  

 9.1041353  1041353 1.04135e+06 0.246667    852186  0.0738636   573584  0.3959  

 9.1070162  1070162 1.07016e+06 0.113333    870718  0   583622  0.400065

BBB.out

             pos      gpos          p1       ihh1       p2      ihh2    xpehh
  8.1135641 1135641 1.13564e+06 0.368333    639953  0.352273    512804  0.2215  
  8.1152035 1152035 1.15204e+06 0.00333333  651548  0   540213  0.187389
  8.1158202 1158202 1.1582e+06  0.358333    646188  0   540213  0.179129
  8.1178735 1178735 1.17874e+06 0.01    654438  0.409091    486335  0.29688
  8.1193344 1193344 1.19334e+06 0   651573  0   497049  0.270699
  8.1230464 1230464 1.23046e+06 0.373333    631599  0.505682    482294  0.269701

我尝试通过

合并它们

files <- list.files(pattern = "*.*.out", full.names = TRUE, recursive = FALSE)  
#make a list of all out.files
uridata <- data.frame()
#go through each file, one by one, and add it to the 'uridata' df,   above  
big_list_of_data_frames <- lapply(files, read.table, skip = FALSE,header = TRUE, stringsAsFactors = FALSE)  
big_data_frame <- do.call(rbind,big_list_of_data_frames)
new_fram <- big_data_frame [,c(1,7)]  

the dput:  
structure(list(pos = c(1022217L, 1024910L, 1041353L, 1070162L, 
1089884L), gpos = c(1022220, 1024910, 1041350, 1070160, 1089880
), p1 = c(0.138333, 0.138333, 0.246667, 0.113333, 0.113333), 
    ihh1 = c(901220L, 900853L, 852186L, 870718L, 870014L), p2 =      c(0.0738636, 
0.0738636, 0.0738636, 0, 0), ihh2 = c(572286L, 572286L, 573584L, 
583622L, 583435L), xpehh = c(0.454111, 0.453703, 0.3959, 
0.400065, 0.399577)), class = "data.frame", row.names = c("9.1022217", 
"9.1024910", "9.1041353", "9.1070162", "9.1089884"))

我希望我的输出文件在csv中

    ID             XPEHH  
    9.1022217     0.454111  
    9.1024910     0.453703
    9.1041353     0.3959 
    .
    .
    .
    8.1135641     0.2215

但是，我不知道为什么输入文件中的第一列将成为big_data_fram中的第0列？

您能提供任何建议吗？

Answer 1

您在合并文件的方法上做得很好。您的问题在于如何使用read.table读取文件，因为read.table假设如果缺少第一列名称，则第一列就是行名。看到这里：

> read.table(text=BBB, header=TRUE)
              pos    gpos         p1   ihh1       p2   ihh2    xpehh
8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, header=TRUE))
[1] "8.1135641" "8.1152035" "8.1158202" "8.1178735" "8.1193344" "8.1230464"

嗯，请看?read.table关于row.names参数的内容。 TLDR；通过将其设置为NULL来禁用它。

> read.table(text=BBB, row.names = NULL, header=TRUE)
  row.names     pos    gpos         p1   ihh1       p2   ihh2    xpehh
1 8.1135641 1135641 1135640 0.36833300 639953 0.352273 512804 0.221500
2 8.1152035 1152035 1152040 0.00333333 651548 0.000000 540213 0.187389
3 8.1158202 1158202 1158200 0.35833300 646188 0.000000 540213 0.179129
4 8.1178735 1178735 1178740 0.01000000 654438 0.409091 486335 0.296880
5 8.1193344 1193344 1193340 0.00000000 651573 0.000000 497049 0.270699
6 8.1230464 1230464 1230460 0.37333300 631599 0.505682 482294 0.269701
> rownames(read.table(text=BBB, row.names = NULL, header=TRUE))
[1] "1" "2" "3" "4" "5" "6"

您可以在此处看到第一列的名称方便地命名为“ row.names”。如果列名是预先固定的，则只需提供带有col.names参数的名称向量即可为第一列指定名称。

对于这些示例，我已经使用BBB参数从变量text中的字符串中读取了文件的内容；您将不得不用file参数和文件名代替它。

为什么我的第一列将在数据帧中变为列0？

1 个答案: