一次处理多个csv表,并将结果粘贴到新列上

时间:2019-02-13 16:28:05

标签: r for-loop

我正在学习如何在R中使用for循环,但这对我能做的事情似乎有点复杂。

我有一些名称格式为"collar41361_41365.0.x.csv"的文件,并希望进行一系列计算,结果将在同一文件的新列中显示。

我一次仅对一个文件执行了此操作,但是希望对所有"collar41361_41365.0.x.csv"个文件进行自动处理。

下面是"collar41361_41365.0.x.csv"文件外观的一小部分示例:

> collaraccuracy<-fread("collar41361_41365.0.8.csv",stringsAsFactors = F)
> print(collaraccuracy)
      V1  observed predicted probability results1 results2       results
  1:   1   Head-up Vigilance   0.2727273 NEGATIVE     TRUE TRUE_NEGATIVE
  2:   2   Head-up   Grazing   0.7272727 NEGATIVE     TRUE TRUE_NEGATIVE
  3:   3   Head-up   Grazing   0.7272727 NEGATIVE     TRUE TRUE_NEGATIVE
  4:   4   Head-up   Grazing   0.5454545 NEGATIVE     TRUE TRUE_NEGATIVE
  5:   5   Head-up   Grazing   0.7272727 NEGATIVE     TRUE TRUE_NEGATIVE

我需要计算"TRUE_POSITIVES"(TP),"FALSE_POSITIVES"(FP),"TRUE_NEGATIVES"(TN)和"FALSE_NEGATIVES"(FN)的总数并计算一个序列措施,例如:

1)精度=(tn + tp)/(tn + tp + fn + fp)

2)精度= tp /(tp + fp)

3)召回= tp /(tp + fn)

这是分析单个文件时的处理方式:

collaraccuracy<-fread("collar41361_41365.0.8.csv",stringsAsFactors = F)
tp<-length(grep("TRUE_POSITIVE", collaraccuracy$results))
fp<-length(grep("FALSE_POSITIVE", collaraccuracy$results))
tn<-length(grep("TRUE_NEGATIVE", collaraccuracy$results))
fn<-length(grep("FALSE_NEGATIVE", collaraccuracy$results))


accuracy = (tn+tp)/(tn+tp+fn+fp)
accuracy
precision = tp/(tp+fp)
precision
recall = tp/(tp+fn)
recall

我想创建一个for循环,该循环将:

1)读取名称格式为"collar41361_41365.0.x.csv"的所有文件,并为每个文件计算accuracyprecisionrecall的值。

2)为每个文件创建三个标题为"accuracy""precision""recall"的列,并将公式的结果粘贴到下面。

任何帮助都是由衷的感谢!

1 个答案:

答案 0 :(得分:1)

类似的事情应该起作用。不确定我是否完全了解预期的输出

# setwd('') # to folder where your csv files are
# change 'file.csv' to 'collar41361_41365.0'
f <- list.files(path = getwd(), full.names = F, pattern = 'file.csv')

dfs <- list()
for(i in 1:length(f)){
  collaraccuracy <- data.table::fread(f[i],stringsAsFactors = F)
  tp <- length(grep("TRUE_POSITIVE", collaraccuracy$results))
  fp <- length(grep("FALSE_POSITIVE", collaraccuracy$results))
  tn <- length(grep("TRUE_NEGATIVE", collaraccuracy$results))
  fn <-length(grep("FALSE_NEGATIVE", collaraccuracy$results))

  # append the results to the files 
  collaraccuracy$accuracy <- (tn+tp)/(tn+tp+fn+fp)
  collaraccuracy$precision <- tp/(tp+fp)
  collaraccuracy$recall <- tp/(tp+fn)

  # you make way to write them to a different directory
  data.table::fwrite(collaraccuracy, file = paste0('new',f[i]))
}