以下是我的代码,可以读取超过1000个csv文件,其中每个文件有超过1000行和4列。每个csv文件只有4列,例如ID,值,param1,param2。我当前的代码段将每个文件分别用各自的文件名读入数据框。它本身很干净。因为它已经实现了,所以我只想找到可以集成到我的函数中的代码。
e.g。输入
200 4.864 ne15 hx1
201 4.872 ne12 hx3
202 4.898 ne10 hx9
203 4.815 ne23 hx1
204 4.699 ne14 hx3
...
212 4.813 ne20 hx2
213 4.763 ne18 hx8
...
输出: e.g。
index row#. value filename
# mean should be the value for row 2 to 20
# it needs to be output in R under row 202
154 202.0 4.337 1wq.csv
164 225.0 4.358 1wq.csv
174 250.0 4.421 1wq.csv
184 275.0 4.498 1wq.csv
194 300.0 4.513 1wq.csv
我没有从该列的csv文件行(18个值)中获取2到20个连续值,而是希望获得与第2行到第20行的值对应的平均值。我该怎么做?
#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
for (k in 1:length(files)){
csv = read.csv(files[k], header = FALSE)
#picking up 2:20 consecutive values, value for row 50,120,150 so on
data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]
#-pivot transform col/row- data <- as.data.frame(t(data))
#but that line screwed up the data
#when those selected values are with NA/blanks
data$file = files[k]
m = rbind(m, data)
}
感谢这两个答案,我做了以下工作: 我将再次分别尝试AdamQuek的答案来改进我的。 现在,我正在解决这个问题。
m = data.frame()
for (k in 1:length(files)) {
csv = read.csv(files[k], header = FALSE)
data = csv[c(2:20, 225, 250, 275, 300, 325, 350), c(1,2)]
data[1,] <- mean(data[c(2:19),c(2)], na.rm=T)
data <- data[-2:-19,]
data[c(1),c(1)] = 200
data$file = files[k]
data <- as.data.frame(t(data))
m = rbind(m, data)
}
答案 0 :(得分:1)
files <- list.files(pattern='\\.csv')
all <- lapply(files, read.csv, header=FALSE)
all.subset <- lapply(all, function(x)x[c(2:20, 50, 120, 150, 175, 200), c(1,2)])
col.means <- function(x) colMeans(x, na.rm=T)
do.call(rbind, lapply(all.subset, col.means))
编辑:
files <- list.files(pattern='\\.csv')
m <- data.frame()
for (k in files){
csv <- read.csv(k, header = FALSE)[, c(1,2)]
v1 <- mean(csv[2:20,1], na.rm=T)
v2 <- mean(csv[2:20,2], na.rm=T)
mean.val <- data.frame(v1=v1, v2=v2, file=k)
subset.data <- csv[c(50, 120, 150, 175, 200),]
subset.data <- rbind(mean.val, subset.data)
m <- rbind(m, subset.data)
}
答案 1 :(得分:1)
这是你想要完成的事情吗?
# Insert the mean of rows 2:20 into row 202
csv[202,"value"] <- mean(csv[2:20,"value])
# Drop rows 2:20 from the dataframe
csv <- csv[-2:-20,]