使用循环按日期范围对矢量进行子集化

时间:2014-01-09 23:10:50

标签: r

我的文件名称中有日期戳,我只想导入一定范围的日期。

首先,我将所有可用的文件作为向量加载到R中:

files <- c("FileName_2013_06_10_00_00_00.txt", "FileName_2013_06_11_00_00_00.txt", 
"FileName_2013_06_12_00_00_00.txt", "FileName_2013_06_13_00_00_00.txt", 
"FileName_2013_06_14_00_00_00.txt", "FileName_2013_06_15_00_00_00.txt", 
"FileName_2013_06_16_00_00_00.txt", "FileName_2013_06_17_00_00_00.txt", 
"FileName_2013_06_18_00_00_00.txt", "FileName_2013_06_19_00_00_00.txt", 
"FileName_2013_06_20_00_00_00.txt", "FileName_2013_06_21_00_00_00.txt", 
"FileName_2013_06_22_00_00_00.txt", "FileName_2013_06_23_00_00_00.txt", 
"FileName_2013_06_24_00_00_00.txt", "FileName_2013_06_25_00_00_00.txt", 
"FileName_2013_06_26_00_00_00.txt", "FileName_2013_06_27_00_00_00.txt", 
"FileName_2013_06_28_00_00_00.txt", "FileName_2013_06_29_00_00_00.txt", 
"FileName_2013_06_30_00_00_00.txt", "FileName_2013_07_01_00_00_00.txt", 
"FileName_2013_07_02_00_00_00.txt", "FileName_2013_07_03_00_00_00.txt", 
"FileName_2013_07_04_00_00_00.txt", "FileName_2013_07_05_00_00_00.txt", 
"FileName_2013_07_06_00_00_00.txt", "FileName_2013_07_07_00_00_00.txt", 
"FileName_2013_07_08_00_00_00.txt", "FileName_2013_07_09_00_00_00.txt", 
"FileName_2013_07_10_00_00_00.txt", "FileName_2013_07_11_00_00_00.txt", 
"FileName_2013_07_12_00_00_00.txt", "FileName_2013_07_13_00_00_00.txt", 
"FileName_2013_07_14_00_00_00.txt", "FileName_2013_07_15_00_00_00.txt")

每个文件名代表FileName_yyyy_mm_dd_HH_MM_SS.txt

其中,我只希望导入以下日期(YearMonthDay是我正在寻找的唯一标准:

datesub <- c("FileName_2013_06_25_00_00_00.txt", "FileName_2013_06_26_00_00_00.txt", 
"FileName_2013_06_27_00_00_00.txt", "FileName_2013_06_28_00_00_00.txt", 
"FileName_2013_06_29_00_00_00.txt", "FileName_2013_06_30_00_00_00.txt", 
"FileName_2013_07_01_00_00_00.txt", "FileName_2013_07_02_00_00_00.txt", 
"FileName_2013_07_03_00_00_00.txt", "FileName_2013_07_04_00_00_00.txt", 
"FileName_2013_07_05_00_00_00.txt", "FileName_2013_07_06_00_00_00.txt", 
"FileName_2013_07_07_00_00_00.txt")

很容易做一个子集(files[files %in% datesub]),但是,由于文件有时会出现这种格式,因此会出现复杂情况:

  • FileName_2013_06_27_12_21_13.txt
  • FileName_2013_06_28_00_00_00comb.txt
  • 或先前示例的任何组合。

我在使用正则表达式将数据导入R之前尝试对数据进行子集化,但是当我尝试执行超过两个月的范围时,事情开始变得混乱。

如何对数据进行子集化?我认为可以使用for循环,但我不确定。

我对所有人和任何建议持开放态度。如果我的问题不够明确,请告诉我,我会尽力澄清。

1 个答案:

答案 0 :(得分:1)

使用正则表达式从datesub获取Y_m_d片段,然后再次使用正则表达式来获取与Y_m_d片段匹配的文件:

datesubclean <- sapply(
  regmatches(datesub, regexec("^FileName_([0-9]{4}_[0-9]{2}_[0-9]{2})", datesub)),
  `[`, 2L
)
files.sub <- sapply(datesubclean, grep, x=files, value=T)
unname(files.sub)
# [1] "FileName_2013_06_25_00_00_00.txt" "FileName_2013_06_26_00_00_00.txt"
# [3] "FileName_2013_06_27_00_00_00.txt" "FileName_2013_06_28_00_00_00.txt"
# [5] "FileName_2013_06_29_00_00_00.txt" "FileName_2013_06_30_00_00_00.txt"
# [7] "FileName_2013_07_01_00_00_00.txt" "FileName_2013_07_02_00_00_00.txt"
# [9] "FileName_2013_07_03_00_00_00.txt" "FileName_2013_07_04_00_00_00.txt"
# [11] "FileName_2013_07_05_00_00_00.txt" "FileName_2013_07_06_00_00_00.txt"
# [13] "FileName_2013_07_07_00_00_00.txt"

然后你要做的就是遍历文件名并打开它们。

regexec是一个特殊的正则表达式函数,它允许我们检索捕获的匹配(正则表达式中的parens中的东西),regmatches能够读取regexec的特殊对象产生。第一个sapply只是从regmatches输出中获取第二个元素,因为除了子模式捕获之外,regmatches还返回完全匹配作为第一个元素。