Question

我想知道是否可以将文件名中的信息分配到R中的数据表中。

例如，我有数以千计的csv文件，格式如下： 2007-Feb-Reservoir-Rain.csv

我需要的是：

将目录中的所有文件放在列表中 - 与files = list.files()
一次上传所有这些csv，将其文件名中的信息作为变量传递到我的表中，并与此表中的实际信息一起传递，这只是给定月份的降雨量。我需要用短划线（ - ）分隔信息，所以它看起来像：

Answer 1

以下是使用tidyverse

的方法

library(tidyverse)

# List all csv files including sub-folders
list_of_files <- list.files(path = ".", recursive = TRUE,
                            pattern = "\\.csv$", full.names = TRUE)

# Loop through all the files using map_df, read data using read_csv 
# and create a FileName column to store filenames
# Then clean up filename: remove file path and extension
# Finally separate Filename into 4 columns using "-" as separator
df <- list_of_files %>%
  purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
  purrr::map_df(readr::read_csv, .id = "FileName") %>% 
  tidyr::separate(FileName, c("year", "month", "type", "milliliters"), "-")

Answer 2

您可以使用rbindlist()包中的data.table函数在连接data.tables列表时添加有关每个文件的信息：

library(data.table)

# Get a vector of file paths you want to load:
files <- list.files(path = ".", pattern = ".*-Rain.csv")

# Load those files into a list of data.tables:
dt_list <- lapply(files, fread)

# Name each list element after its file of origin:
names(dt_list) <- files

# Concatenate all files into a single data.table, with
# an additional column containing the filename each row 
# came from (taken from the names(dt_list))
dt <- rbindlist(dt_list, idcol = "file")

# Split the file name into three new columns:
dt[, year := as.numeric(sapply(strsplit(file, "-"), `[`, 1))]
dt[, month := sapply(strsplit(file, "-"), `[`, 2)]
dt[, type := sapply(strsplit(file, "-"), `[`, 3)]

# Remove the filename column since its no longer needed
dt[, file := NULL]

将文件名中的信息分配给数据表

2 个答案: