Question

我有一系列数据，看起来像是

sale20160101.txt, 

sales20160102.txt,...,

sales20171231.

现在我想要全部阅读并结合，但它还需要一个日期变量帮我识别它们的发生时间，所以日期变量就是 20160101,20160102，...，20161231。

我的想法是：

将文件名拆分为销售+“时间”

每当我根据数据长度

读取时，

重复时间

cbind数据和时间。

很多。

Answer 1

我通常会做以下变化：

# find the files
ls <- list.files(pattern = '^sales')
# Get the dates
dates <- gsub('sales', '', tools::file_path_sans_ext(ls))

# read in the data
dfs <- lapply(ls, read.table)
# match the dates
names(dfs) <- dates

# bind all data together and include the date as a column
df <- dplyr::bind_rows(dfs, .id = 'date')

Answer 2

我们可以使用fread

中的rbindlist和data.table执行此操作

library(data.table)
#find the files that have names starting as 'sales' followed by numbers
#and have .txt extension
files <- list.files(pattern = "^sale.*\\d+\\.txt", full.names = TRUE)

#get the dates
dates <-   readr::parse_number(basename(files))

#read the files into a list and rbind it 
dt <- rbindlist(setNames(lapply(files, fread), dates), idcol = 'date')

R读取文件名并将其放入变量中

2 个答案: