我每分钟都有几个.csv文件,它们具有相同的抽水量格式。我需要汇总每天的总抽水量。一些数据具有文本错误(关闭数据记录器)或负值(意味着流量为0)。我写了下面的代码来做到这一点。如何在多个文件上循环播放,而不是每个月进行复制和粘贴?所有文件都标记为“ Mon_Year_Well_Flows.csv”。我尝试使用for循环和使用lapply均未成功。另外,我对R很陌生,所以我了解我的代码可能效率很低。
第一个月数据文件“ Jul_2017_Well_Flows.csv”的第一行
Date DW_20 DW_24A DW_25A DW_26A DW_27A DW_28 DW_29
9/1/18 0:00 995.88 1110.62 1229.14 -0.09 4.5 1100.95 913.33
9/1/18 0:01 1002.43 1115.85 1231.59 -0.09 4.5 1107.63 909.06
9/1/18 0:02 1007.01 1123.39 1236.75 -0.09 4.51 1108.37 935
9/1/18 0:03 1007.17 1121.69 1234.58 -0.09 4.52 1105.64 901.35
9/1/18 0:04 1005.27 1122.86 1233.25 -0.09 4.53 1107.56 911.15
9/1/18 0:05 1001.37 1116.39 1229.89 -0.09 4.54 1103.66 937.93
第一个月数据文件的代码
#Load data
data <- read.csv("Jul_2017_Well_Flows.csv", header = T)
#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
Jul_2017 <- data %>%
#Remove date column
select(-Date) %>%
#Group all data according to day
group_by(day) %>%
#Sum all daily well data by day
summarize_all(sum)
每个月复制和粘贴以上代码结束时,我将执行以下操作以将所有输出文件绑定在一起-
combined <- bind_rows(Jul_2017, Aug_2017....)
答案 0 :(得分:0)
我要回答这个问题:
我该如何遍历多个文件,而不是每个月进行复制和粘贴?
要开始使用,一种方法是首先获取该目录中的文件名列表。试试:
filenames <- list.files("temp", pattern="*.csv", full.names=TRUE)
#Load data
data <- read.csv(filenames[[1]], header = T) # read in the first file as usual
#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
Jul_2017 <- data %>%
#Remove date column
select(-Date) %>%
#Group all data according to day
group_by(day) %>%
#Sum all daily well data by day
summarize_all(sum)
#I'm not sure if you can use bind_rows with one argument - I am not able
# to test code at the moment. Create a storage place for the combined dfs.
combined <- bind_rows(Jul_2017)
for (i in 2:len(filenames)) {
temp_month <- read.csv(filenames[[i]], header = TRUE) # Notice the temp_month
#Load data
data <- read.csv(filenames[[1]], header = T) # read in first file as usual
#Create new data frame with date info
data1 <- data.frame("Date" = data$Date)
#Remove all error text to NA
index <- supply(data, is.factor)
data[index] <- apply(data[index], function(x) as.numeric(as.character(x)))
#Convert all NA values to 0
data[is.na(data)] <- 0
#Converting all negative pumping rates to 0
data[,-c(1)][data[,-c(1)]<0] <-0
#Add back original date column
data <- select(data, -Date)
data <- bind_cols(data,data1)
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
temp_month <- data %>%
#Remove date column
select(-Date) %>%
#Group all data according to day
group_by(day) %>%
#Sum all daily well data by day
summarize_all(sum)
combined <- bind_rows(combined, temp_month)
}
答案 1 :(得分:0)
由于其中存在多个错误,我不得不更改您的代码。
一种简单的方法是首先编写一个由您的代码组成的函数
library(dplyr)
process_csv <- function(file) {
#Load data
data <- read.csv(file, header = TRUE, stringsAsFactors = FALSE)
#Create new data frame with date info
data$Date <- as.POSIXlt.character(data$Date, format = "%m/%d/%Y %H:%M", tz = "GMT")
#Remove all error text to NA
numeric_columns <- 2:ncol(data)
for (c in numeric_columns) {
# convert character class columns to numeric
# NAs are coerced where data is missing (will generate warning)
data[, c] <- as.numeric(data[, c])
#Convert all NA values to 0
data[is.na(data[, c]), c] <- 0
#Converting all negative pumping rates to 0
data[data[, c] < 0, c] <- 0
}
#Remove minute data and change day to date formatting
data$Day <- as.Date(data$Date, '%m/%d/%Y')
out <- data %>%
#Remove date column
select(-Date) %>%
#Group all data according to day
group_by(Day) %>%
#Sum all daily well data by day
summarize_all(sum)
return(out)
}
然后,您可以使用list.files()
在文件夹中搜索.csv
个文件:
folder <- "C:/path/to/your/data/"
files <- list.files(path = folder, pattern = ".csv$", full.names = TRUE)
然后,您可以使用lapply()
遍历文件并将功能应用于每个文件。
# loop over files
combined <- lapply(files, process_csv) %>%
bind_rows()