我有一些文件中包含YYYYMMDD
日期代码。例如,my20150112.csv。如何在R中进行for循环,以便R在完成上一个日期的处理后自动处理下一个日期。
以下是以下脚本:
R_script -> function(file){
read.csv(file)
}
例如,如何在脚本运行R_script(my20150112.csv)
后自动运行R_script(my20150111.csv)
?
谢谢
答案 0 :(得分:1)
这是一种方法
files = dir(pattern =".csv") #Obtain the names of all files
file_dates = gsub("[^0-9]", "", files) #Obtain the numeric value in each file
require(anytime) #We'll use anytime package
file_dates = anydate(file_dates) #Convert the numeric values to dates
files = files[order(file_dates)] #Order the files according to dates
for (i in 1:length(files)){ #Run your operations
df = read.csv(file = files[i])
#YOUR CODE
}
答案 1 :(得分:0)
假设您的个人文件都具有相同的格式:
setwd(<directory where files are>)
for (x in list.files()) {
file <- read.table(i, header=TRUE) # Not sure if you have headers or not
assign(x=as.character(x), value=file, envir=.GlobalEnv)
}
答案 2 :(得分:0)
这是一个端到端的解决方案。以下函数验证文件名语法以及它是否是有效日期。它包括范围日期过滤器。有关详细信息,请参阅函数内的注释。
processYYYYMMDDFiles <- function(path = getwd(), start, end) {
path <- normalizePath(path)
# The regular expression for a file
regEx = "^my\\d{8}\\.csv$"
# A more precise regular expression for a valid YYYYMMDD
# ^my(19|20)\\d\\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\\.csv$
# Finding all possible files in the path folder
listFiles = dir(path = path, pattern=regEx)
# Selecting just files with valid dates (not just the format, also a valid one, for
# example is 20170229 is not a valid one, but it is valid syntactically)
# Extracting the date information: YYYYMMMDD
datesOfFiles <- substring(listFiles, 3, 10)
# Internal function for validating a date
checkDate <- function(YYYMMDDdate) {
return(!is.na(as.Date(as.character(YYYMMDDdate),
tz = 'UTC', format = '%Y%m%d')))
}
# Checking
datesOfFilesCheck <- sapply(datesOfFiles, checkDate)
# Reporting about not valid dates
listFilesNOK = listFiles[datesOfFilesCheck == FALSE]
if (length(listFilesNOK) > 0) {
msg = paste("From folder: '%s' skiping the following files,",
"because they have not a valid date:'%s'")
msg = sprintf(msg, path, toString(listFilesNOK))
message(msg)
}
# Filtering for only valid date within the interval [start, end]
validIdx <- (datesOfFilesCheck == TRUE) &
(datesOfFiles >= start) & (datesOfFiles <= end)
listFiles <- listFiles[validIdx]
listFiles <- listFiles[order(listFiles)] # sorting
nFiles = length(listFiles)
print(sprintf("Processing files from folder: '%s'", path))
for (i in 1:nFiles) {
iFile = listFiles[i]
# Here comes the additional tasks for this function
print(sprintf("Processing file: '%s'", iFile))
}
}
现在测试在temp目录中创建临时文件的函数:
# Testing
files <- c("my20170101.csv", "my20170110.csv", "my20170215.csv", "my20170229.csv",
"my20170315.csv", "my20170820.csv")
tmpDir <- tempdir()
file.create(file.path(tmpDir, files), overwrite=T)
processYYYYMMDDFiles(path = tmpDir, start="20170101", end="20170330")
print("Removing the testing files...")
print(file.remove(file.path(tmpDir, files)))
它产生以下输出:
> source("~/R-workspace/projects/samples/samples/processYYYYMMDDFiles.R", encoding = "Windows-1252")
From folder: 'C:\Users\dleal\AppData\Local\Temp\RtmpoZLcNS' skiping the following files, because they have not a valid date:'my20170229.csv'
[1] "Processing files from folder: 'C:\\Users\\dleal\\AppData\\Local\\Temp\\RtmpoZLcNS'"
[1] "Processing file: 'my20170101.csv'"
[1] "Processing file: 'my20170110.csv'"
[1] "Processing file: 'my20170215.csv'"
[1] "Processing file: 'my20170315.csv'"
[1] "Removing the testing files..."
[1] TRUE TRUE TRUE TRUE TRUE TRUE
我希望这会有所帮助