我想知道是否可以将文件名中的信息分配到R中的数据表中。
例如,我有数以千计的csv文件,格式如下: 2007-Feb-Reservoir-Rain.csv
我需要的是:
files = list.files()
答案 0 :(得分:2)
以下是使用tidyverse
library(tidyverse)
# List all csv files including sub-folders
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.csv$", full.names = TRUE)
# Loop through all the files using map_df, read data using read_csv
# and create a FileName column to store filenames
# Then clean up filename: remove file path and extension
# Finally separate Filename into 4 columns using "-" as separator
df <- list_of_files %>%
purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
purrr::map_df(readr::read_csv, .id = "FileName") %>%
tidyr::separate(FileName, c("year", "month", "type", "milliliters"), "-")
答案 1 :(得分:1)
您可以使用rbindlist()
包中的data.table
函数在连接data.tables
列表时添加有关每个文件的信息:
library(data.table)
# Get a vector of file paths you want to load:
files <- list.files(path = ".", pattern = ".*-Rain.csv")
# Load those files into a list of data.tables:
dt_list <- lapply(files, fread)
# Name each list element after its file of origin:
names(dt_list) <- files
# Concatenate all files into a single data.table, with
# an additional column containing the filename each row
# came from (taken from the names(dt_list))
dt <- rbindlist(dt_list, idcol = "file")
# Split the file name into three new columns:
dt[, year := as.numeric(sapply(strsplit(file, "-"), `[`, 1))]
dt[, month := sapply(strsplit(file, "-"), `[`, 2)]
dt[, type := sapply(strsplit(file, "-"), `[`, 3)]
# Remove the filename column since its no longer needed
dt[, file := NULL]