如何从R中的csv行范围直接得到均值?

时间:2017-06-08 08:41:33

标签: r csv statistics mean

enter image description here

以下是我的代码,可以读取超过1000个csv文件,其中每个文件有超过1000行和4列。每个csv文件只有4列,例如ID,值,param1,param2。我当前的代码段将每个文件分别用各自的文件名读入数据框。它本身很干净。因为它已经实现了,所以我只想找到可以集成到我的函数中的代码。

e.g。输入

200 4.864 ne15 hx1
201 4.872 ne12 hx3
202 4.898 ne10 hx9
203 4.815 ne23 hx1
204 4.699 ne14 hx3
...
212 4.813 ne20 hx2
213 4.763 ne18 hx8
...

输出: e.g。

index  row#.   value     filename
# mean should be the value for row 2 to 20
# it needs to be output in R under row 202
154    202.0   4.337     1wq.csv
164    225.0   4.358     1wq.csv
174    250.0   4.421     1wq.csv
184    275.0   4.498     1wq.csv
194    300.0   4.513     1wq.csv

我没有从该列的csv文件行(18个值)中获取2到20个连续值,而是希望获得与第2行到第20行的值对应的平均值。我该怎么做?

#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
 for (k in 1:length(files)){
    csv = read.csv(files[k], header = FALSE)

    #picking up 2:20 consecutive values, value for row 50,120,150 so on
    data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]

    #-pivot transform col/row- data <- as.data.frame(t(data))
    #but that line screwed up the data
    #when those selected values are with NA/blanks
    data$file = files[k]

    m = rbind(m, data)
 }

感谢这两个答案,我做了以下工作: 我将再次分别尝试AdamQuek的答案来改进我的。 现在,我正在解决这个问题。

m = data.frame()
for (k in 1:length(files)) {
  csv = read.csv(files[k], header = FALSE)
  data = csv[c(2:20, 225, 250, 275, 300, 325, 350), c(1,2)]
  data[1,] <- mean(data[c(2:19),c(2)], na.rm=T)
  data <- data[-2:-19,]
  data[c(1),c(1)] = 200
  data$file = files[k]
  data <- as.data.frame(t(data))
  m = rbind(m, data)
}

2 个答案:

答案 0 :(得分:1)

files <- list.files(pattern='\\.csv')    
all <- lapply(files, read.csv, header=FALSE)
all.subset <- lapply(all, function(x)x[c(2:20, 50, 120, 150, 175, 200), c(1,2)])

col.means <- function(x) colMeans(x, na.rm=T)

do.call(rbind, lapply(all.subset, col.means))

编辑:

files <- list.files(pattern='\\.csv')
m <- data.frame()

for (k in files){
    csv <- read.csv(k, header = FALSE)[, c(1,2)]

    v1 <- mean(csv[2:20,1], na.rm=T)
    v2 <- mean(csv[2:20,2], na.rm=T)
    mean.val <- data.frame(v1=v1, v2=v2, file=k)

    subset.data <- csv[c(50, 120, 150, 175, 200),]
    subset.data <- rbind(mean.val, subset.data)

    m <- rbind(m, subset.data)
}

答案 1 :(得分:1)

这是你想要完成的事情吗?

# Insert the mean of rows 2:20 into row 202
csv[202,"value"] <- mean(csv[2:20,"value])

# Drop rows 2:20 from the dataframe
csv <- csv[-2:-20,]