我有一系列数据,看起来像是
sale20160101.txt,
sales20160102.txt,...,
sales20171231.
现在我想要全部阅读并结合,但它还需要一个日期变量 帮我识别它们的发生时间,所以日期变量就是 20160101,20160102,...,20161231。
我的想法是:
将文件名拆分为销售+“时间”
每当我根据数据长度读取时,重复时间
cbind数据和时间。
很多。答案 0 :(得分:1)
我通常会做以下变化:
# find the files
ls <- list.files(pattern = '^sales')
# Get the dates
dates <- gsub('sales', '', tools::file_path_sans_ext(ls))
# read in the data
dfs <- lapply(ls, read.table)
# match the dates
names(dfs) <- dates
# bind all data together and include the date as a column
df <- dplyr::bind_rows(dfs, .id = 'date')
答案 1 :(得分:1)
我们可以使用fread
rbindlist
和data.table
执行此操作
library(data.table)
#find the files that have names starting as 'sales' followed by numbers
#and have .txt extension
files <- list.files(pattern = "^sale.*\\d+\\.txt", full.names = TRUE)
#get the dates
dates <- readr::parse_number(basename(files))
#read the files into a list and rbind it
dt <- rbindlist(setNames(lapply(files, fread), dates), idcol = 'date')