我是 R 的新手,并尝试创建一个 R 函数,该函数将解析具有多个子目录的目录,这些子目录按时间段命名。我想确定可以选择哪个最小的子目录集来形成一个连续的'时间段。 该函数将返回一个字符向量,用于选择感兴趣的子目录。
一个例子: 假设目录"〜"包含以下6个子目录,其中起始和结束日期位于" ddmmyy"格式):
- "01_231014_190115"
- "02_231014_190215"
- "03_190215_200215"
- "04_200215_220215"
- "05_220215_130315"
- "06_220215_270315"
该函数将返回:
"02_231014_190215", "03_190215_200215", "04_200215_220215", "06_220215_270315"
我直到测试才用这个代码以干净的方式确定类似的开始和结束日期:
foldernames <- list.files( "~")
listsplitted <- strsplit(foldernames,"_")
df <- data.frame(matrix(unlist(listsplitted), nrow=length(foldernames), byrow=T))
colnames(df) <- c("ID","D.start","D.end")
df[, 2:3] <- lapply(df[, 2:3], as.Date, format = "%d%m%y",origin="01-01-2000")
df$d.range <- df[, 3]- df[, 2]
目前返回:
> df
ID D.start D.end d.range
1 01 2014-10-23 2015-01-19 88 days
2 02 2014-10-23 2015-02-19 119 days
3 03 2015-02-19 2015-02-20 1 days
4 04 2015-02-20 2015-02-22 2 days
5 05 2015-02-22 2015-03-13 19 days
6 06 2015-02-22 2015-03-27 33 days
我很感激这方面的一点帮助。
答案 0 :(得分:0)
编辑:这可能是一种方法。
在这里,我从您的问题中创建了file_list。但您可以使用list.dirs()函数获取目录列表,其中recursive = FALSE以防止在目录中列出子目录。
#dir_list = list.dirs(path = ".", recursive = FALSE)
dir_list = c("01_231014_190115", "02_231014_190215" , "03_190215_200215", "04_200215_220215", "05_220215_130315" , "06_220215_270315")
df1 <- data.frame(ID = integer(), D.start = character(), D.end = character(), d.range = numeric(), stringsAsFactors = FALSE)
counter = 0
for( i in dir_list){
counter = counter + 1
id = as.integer(sub("(.*)(_)(.*)(_)(.*)", '\\1', i))
start_date = sub("(.*)(_)(.*)(_)(.*)", '\\3', i)
start_date = as.character(as.Date(start_date, format = "%d%m%y", origin="01-01-2000"))
end_date = sub("(.*)(_)(.*)(_)(.*)", '\\5', i)
end_date = as.character(as.Date(end_date, format = "%d%m%y", origin="01-01-2000"))
df1[counter,1] = id
df1[counter,2:3] = c(start_date, end_date)
df1[counter,4] = as.numeric(difftime(end_date, start_date))
}
uniq_start_dates = unique(df1[,2])
df3 <- data.frame(ID = integer(), D.start = character(), D.end = character(), d.range = numeric(), stringsAsFactors = FALSE)
for(j in uniq_start_dates){
df2 = df1[which(df1[,2] %in% j), ]
df3 <- do.call("rbind", list(df3, head(df2[with(df2, order(d.range, decreasing = TRUE)), ], 1)))
}
rm("counter", "id", "end_date", "start_date", "dir_list", "j", "i", "df1", "df2", "uniq_start_dates")
输出:
print(df1)
ID D.start D.end d.range
1 1 2014-10-23 2015-01-19 88.04167
2 2 2014-10-23 2015-02-19 119.04167
3 3 2015-02-19 2015-02-20 1.00000
4 4 2015-02-20 2015-02-22 2.00000
5 5 2015-02-22 2015-03-13 18.95833
6 6 2015-02-22 2015-03-27 32.95833
print(df3)
ID D.start D.end d.range
2 2 2014-10-23 2015-02-19 119.04167
3 3 2015-02-19 2015-02-20 1.00000
4 4 2015-02-20 2015-02-22 2.00000
6 6 2015-02-22 2015-03-27 32.95833