我的文件名称中有日期戳,我只想导入一定范围的日期。
首先,我将所有可用的文件作为向量加载到R中:
files <- c("FileName_2013_06_10_00_00_00.txt", "FileName_2013_06_11_00_00_00.txt",
"FileName_2013_06_12_00_00_00.txt", "FileName_2013_06_13_00_00_00.txt",
"FileName_2013_06_14_00_00_00.txt", "FileName_2013_06_15_00_00_00.txt",
"FileName_2013_06_16_00_00_00.txt", "FileName_2013_06_17_00_00_00.txt",
"FileName_2013_06_18_00_00_00.txt", "FileName_2013_06_19_00_00_00.txt",
"FileName_2013_06_20_00_00_00.txt", "FileName_2013_06_21_00_00_00.txt",
"FileName_2013_06_22_00_00_00.txt", "FileName_2013_06_23_00_00_00.txt",
"FileName_2013_06_24_00_00_00.txt", "FileName_2013_06_25_00_00_00.txt",
"FileName_2013_06_26_00_00_00.txt", "FileName_2013_06_27_00_00_00.txt",
"FileName_2013_06_28_00_00_00.txt", "FileName_2013_06_29_00_00_00.txt",
"FileName_2013_06_30_00_00_00.txt", "FileName_2013_07_01_00_00_00.txt",
"FileName_2013_07_02_00_00_00.txt", "FileName_2013_07_03_00_00_00.txt",
"FileName_2013_07_04_00_00_00.txt", "FileName_2013_07_05_00_00_00.txt",
"FileName_2013_07_06_00_00_00.txt", "FileName_2013_07_07_00_00_00.txt",
"FileName_2013_07_08_00_00_00.txt", "FileName_2013_07_09_00_00_00.txt",
"FileName_2013_07_10_00_00_00.txt", "FileName_2013_07_11_00_00_00.txt",
"FileName_2013_07_12_00_00_00.txt", "FileName_2013_07_13_00_00_00.txt",
"FileName_2013_07_14_00_00_00.txt", "FileName_2013_07_15_00_00_00.txt")
每个文件名代表FileName_yyyy_mm_dd_HH_MM_SS.txt
其中,我只希望导入以下日期(Year
,Month
和Day
是我正在寻找的唯一标准:
datesub <- c("FileName_2013_06_25_00_00_00.txt", "FileName_2013_06_26_00_00_00.txt",
"FileName_2013_06_27_00_00_00.txt", "FileName_2013_06_28_00_00_00.txt",
"FileName_2013_06_29_00_00_00.txt", "FileName_2013_06_30_00_00_00.txt",
"FileName_2013_07_01_00_00_00.txt", "FileName_2013_07_02_00_00_00.txt",
"FileName_2013_07_03_00_00_00.txt", "FileName_2013_07_04_00_00_00.txt",
"FileName_2013_07_05_00_00_00.txt", "FileName_2013_07_06_00_00_00.txt",
"FileName_2013_07_07_00_00_00.txt")
很容易做一个子集(files[files %in% datesub]
),但是,由于文件有时会出现这种格式,因此会出现复杂情况:
FileName_2013_06_27_12_21_13.txt
,FileName_2013_06_28_00_00_00comb.txt
,我在使用正则表达式将数据导入R之前尝试对数据进行子集化,但是当我尝试执行超过两个月的范围时,事情开始变得混乱。
如何对数据进行子集化?我认为可以使用for
循环,但我不确定。
我对所有人和任何建议持开放态度。如果我的问题不够明确,请告诉我,我会尽力澄清。
答案 0 :(得分:1)
使用正则表达式从datesub获取Y_m_d片段,然后再次使用正则表达式来获取与Y_m_d片段匹配的文件:
datesubclean <- sapply(
regmatches(datesub, regexec("^FileName_([0-9]{4}_[0-9]{2}_[0-9]{2})", datesub)),
`[`, 2L
)
files.sub <- sapply(datesubclean, grep, x=files, value=T)
unname(files.sub)
# [1] "FileName_2013_06_25_00_00_00.txt" "FileName_2013_06_26_00_00_00.txt"
# [3] "FileName_2013_06_27_00_00_00.txt" "FileName_2013_06_28_00_00_00.txt"
# [5] "FileName_2013_06_29_00_00_00.txt" "FileName_2013_06_30_00_00_00.txt"
# [7] "FileName_2013_07_01_00_00_00.txt" "FileName_2013_07_02_00_00_00.txt"
# [9] "FileName_2013_07_03_00_00_00.txt" "FileName_2013_07_04_00_00_00.txt"
# [11] "FileName_2013_07_05_00_00_00.txt" "FileName_2013_07_06_00_00_00.txt"
# [13] "FileName_2013_07_07_00_00_00.txt"
然后你要做的就是遍历文件名并打开它们。
regexec
是一个特殊的正则表达式函数,它允许我们检索捕获的匹配(正则表达式中的parens中的东西),regmatches
能够读取regexec
的特殊对象产生。第一个sapply
只是从regmatches
输出中获取第二个元素,因为除了子模式捕获之外,regmatches
还返回完全匹配作为第一个元素。