将文件名中的信息分配给数据表

时间:2018-02-21 05:04:08

标签: r

我想知道是否可以将文件名中的信息分配到R中的数据表中。

例如,我有数以千计的csv文件,格式如下: 2007-Feb-Reservoir-Rain.csv

我需要的是:

  • 将目录中的所有文件放在列表中 - 与files = list.files()
  • 一样
  • 一次上传所有这些csv,将其文件名中的信息作为变量传递到我的表中,并与此表中的实际信息一起传递,这只是给定月份的降雨量。我需要用短划线( - )分隔信息,所以它看起来像:

enter image description here

2 个答案:

答案 0 :(得分:2)

以下是使用tidyverse

的方法
library(tidyverse)

# List all csv files including sub-folders
list_of_files <- list.files(path = ".", recursive = TRUE,
                            pattern = "\\.csv$", full.names = TRUE)

# Loop through all the files using map_df, read data using read_csv 
# and create a FileName column to store filenames
# Then clean up filename: remove file path and extension
# Finally separate Filename into 4 columns using "-" as separator
df <- list_of_files %>%
  purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
  purrr::map_df(readr::read_csv, .id = "FileName") %>% 
  tidyr::separate(FileName, c("year", "month", "type", "milliliters"), "-")

答案 1 :(得分:1)

您可以使用rbindlist()包中的data.table函数在连接data.tables列表时添加有关每个文件的信息:

library(data.table)

# Get a vector of file paths you want to load:
files <- list.files(path = ".", pattern = ".*-Rain.csv")

# Load those files into a list of data.tables:
dt_list <- lapply(files, fread)

# Name each list element after its file of origin:
names(dt_list) <- files

# Concatenate all files into a single data.table, with
# an additional column containing the filename each row 
# came from (taken from the names(dt_list))
dt <- rbindlist(dt_list, idcol = "file")

# Split the file name into three new columns:
dt[, year := as.numeric(sapply(strsplit(file, "-"), `[`, 1))]
dt[, month := sapply(strsplit(file, "-"), `[`, 2)]
dt[, type := sapply(strsplit(file, "-"), `[`, 3)]

# Remove the filename column since its no longer needed
dt[, file := NULL]