Question

我有20个数据文件（.txt）。我的最终目标是从每个20个文件中选择一个特定的列（比如说V3），然后创建一个新文件。我试过了 temp＆lt; - list.files（pattern =＆＃39; *。snp.blp＆＃39;）

我如何从每个20个文件中提取V3并在r？

中组合（cbind）它们

Answer 1

我们可以使用来自fread的{{1}} data.table也可以选择select来选择我们打算阅读的特定列而不是读取整个数据

library(data.table)
library(purrr)
library(dplyr)
map(temp, fread, select = 'V3') %>%
      bind_cols

如果行数不相同，请使用cbind.fill

out <- map(temp, fread, select = 'V3') 
do.call(rowr::cbind.fill, c(out, fill = NA))

数据

set.seed(24) 
invisible(map(paste0('snp.blp', 1:3, '.csv'), ~
     matrix(sample(1:10, 10 * 3, replace = TRUE), ncol = 3,
       dimnames = list(NULL, paste0("V", 1:3))) %>% 
                  as_tibble %>%
                  readr::write_csv(., path = .x)))
temp <- list.files(pattern='snp.blp')

Answer 2

可以说，rbind()多个文件中相同变量的行比cbind()更好，尤其是当文件具有不同行数时，cbind()失败。

如果我们只需要合并多个文件中的单个列，我们也可以使用unlist()代替rbind()。

使用基本R组合行的完整工作示例可以使用lapply()，匿名函数和unlist()来完成。我们将使用来自kaggle.com的Alex Barradas的神奇宝贝统计数据库中的数据，在那里我将数据重组为6个CSV文件，每个神奇宝贝前六代都有一个。

download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/pokemonData.zip",
              "pokemonData.zip",
              method="wininet",mode="wb")
unzip("pokemonData.zip")

thePokemonFiles <- list.files("./pokemonData",
                              full.names=TRUE)
attackStats <- lapply(thePokemonFiles,function(x) {
     # read data and subset to Attack stat using the extract operator [
     read.csv(x)["Attack"]         
})
# unlist to combine into a vector
attackStats <- unlist(attackStats)
# use the data in another R function
hist(attackStats)

...和输出：

导入多个文件并在r中提取特定列

2 个答案:

数据