我正试图在r中提取我的部分数据,以解决另一个问题。我不确定如何提取文件夹读取的数据的子集。
当前通过以下代码读取我的数据:
library(data.table, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
################
## PARAMETERS ##
################
# Set path of major source folder for raw transaction data
in_directory <- "C:/Users/NAME/Documents/Raw Data/"
# List names of sub-folders (currently grouped by first two characters of
CUST_ID)
in_subfolders <- list("AA-CA", "CB-HZ", "IA-IL", "IM-KZ", "LA-MI", "MJ-MS",
"MT-NV", "NW-OH", "OI-PZ", "QA-TN", "TO-UZ",
"VA-WA", "WB-ZZ")
# Set location for output
out_directory <- "C:/Users/NAME/Documents/YTD Master/"
out_filename <- "OUTPUT.csv"
# Set beginning and end of date range to be collected - year-month-day format
date_range <- interval(as.Date("2018-01-01"), as.Date("2018-05-31"))
# Enable or disable filtering of raw files to only grab items bought within
certain months to save space.
# If false, all files will be scanned for unique items, which will take
longer and be a larger file.
date_filter <- TRUE
我希望提供一个数据集,以便我能给出一个可重复的示例。
我处理大量数据,因此我从数据库中提取信息并将其按日期存储在文件夹中。然后进行设置,以便可以从数据中提取所需的任何日期。
我在代码中提供了超出必要的内容,但这是我使用代码进行操作之前的第一部分。