在R中以For循环方式读取文件

时间:2017-02-21 22:36:42

标签: r function for-loop

我有一些文件中包含YYYYMMDD日期代码。例如,my20150112.csv。如何在R中进行for循环,以便R在完成上一个日期的处理后自动处理下一个日期。 以下是以下脚本:

R_script -> function(file){
   read.csv(file)
}

例如,如何在脚本运行R_script(my20150112.csv)后自动运行R_script(my20150111.csv)

谢谢

3 个答案:

答案 0 :(得分:1)

这是一种方法

files = dir(pattern =".csv") #Obtain the names of all files
file_dates = gsub("[^0-9]", "", files) #Obtain the numeric value in each file
require(anytime) #We'll use anytime package
file_dates = anydate(file_dates) #Convert the numeric values to dates
files = files[order(file_dates)] #Order the files according to dates

for (i in 1:length(files)){ #Run your operations
    df = read.csv(file = files[i]) 
    #YOUR CODE
}

答案 1 :(得分:0)

假设您的个人文件都具有相同的格式:

setwd(<directory where files are>)
for (x in list.files()) {
  file <- read.table(i, header=TRUE)  # Not sure if you have headers or not
  assign(x=as.character(x), value=file, envir=.GlobalEnv)
}

答案 2 :(得分:0)

这是一个端到端的解决方案。以下函数验证文件名语法以及它是否是有效日期。它包括范围日期过滤器。有关详细信息,请参阅函数内的注释。

processYYYYMMDDFiles <- function(path = getwd(), start, end) {
    path <- normalizePath(path)
    # The regular expression for a file
    regEx = "^my\\d{8}\\.csv$"
    # A more precise regular expression for a valid YYYYMMDD
    # ^my(19|20)\\d\\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\\.csv$

    # Finding all possible files in the path folder
    listFiles = dir(path = path, pattern=regEx)

    # Selecting just files with valid dates (not just the format, also a valid one, for 
    # example is 20170229 is not a valid one, but it is valid syntactically) 
    # Extracting the date information: YYYYMMMDD
    datesOfFiles <- substring(listFiles, 3, 10)

    # Internal function for validating a date
    checkDate <- function(YYYMMDDdate) {
        return(!is.na(as.Date(as.character(YYYMMDDdate),
        tz = 'UTC', format = '%Y%m%d')))
    }

    # Checking
    datesOfFilesCheck <- sapply(datesOfFiles, checkDate)

    # Reporting about not valid dates
    listFilesNOK = listFiles[datesOfFilesCheck == FALSE]
    if (length(listFilesNOK) > 0) {
        msg = paste("From folder: '%s' skiping the following files,",
                "because they have not a valid date:'%s'")
        msg = sprintf(msg, path, toString(listFilesNOK))
        message(msg)
    }

    # Filtering for only valid date within the interval [start, end]
    validIdx <- (datesOfFilesCheck == TRUE) & 
        (datesOfFiles >= start) & (datesOfFiles <= end)
    listFiles <- listFiles[validIdx]
    listFiles <- listFiles[order(listFiles)] # sorting
    nFiles = length(listFiles)

    print(sprintf("Processing files from folder: '%s'", path))
    for (i in 1:nFiles) {
        iFile = listFiles[i]
        # Here comes the additional tasks for this function
        print(sprintf("Processing file: '%s'", iFile))
    }
}

现在测试在temp目录中创建临时文件的函数:

# Testing
files <- c("my20170101.csv", "my20170110.csv", "my20170215.csv", "my20170229.csv", 
   "my20170315.csv", "my20170820.csv")
tmpDir <- tempdir()
file.create(file.path(tmpDir, files), overwrite=T)

processYYYYMMDDFiles(path = tmpDir, start="20170101", end="20170330")
print("Removing the testing files...")
print(file.remove(file.path(tmpDir, files)))

它产生以下输出:

> source("~/R-workspace/projects/samples/samples/processYYYYMMDDFiles.R", encoding = "Windows-1252")
From folder: 'C:\Users\dleal\AppData\Local\Temp\RtmpoZLcNS' skiping the following files, because they have not a valid date:'my20170229.csv'
[1] "Processing files from folder: 'C:\\Users\\dleal\\AppData\\Local\\Temp\\RtmpoZLcNS'"
[1] "Processing file: 'my20170101.csv'"
[1] "Processing file: 'my20170110.csv'"
[1] "Processing file: 'my20170215.csv'"
[1] "Processing file: 'my20170315.csv'"
[1] "Removing the testing files..."
[1] TRUE TRUE TRUE TRUE TRUE TRUE

我希望这会有所帮助