Question

我觉得有可能更好的方法来做到这一点，我不知道。我为多组相似的数据做了很多。我可以通过哪些其他方式构建这些数据，以便快速且易于阅读？

indices <- c("indexA", "indexB", "indexC")

for(i in 1:length(indices)){
  index <- indices[i]

  #This line reads CSV of the given index in the loop.
  eval(parse(text=paste0(index, " <- ",
                      "read.csv('data/", index, ".csv', skip=8)")))

  # This line does some type of computation using the newly imported data set
  eval(parse(text=paste0(index, "$newColumn <- ", 
                     index, "[1, 'Column.D']")))

}

Answer 1

在此处使用for循环读取数据文件没有错，但使用eval()和parse()是不必要的。使用提供的工具。最好将结果保存在一个列表中（在这种情况下，它似乎是data.frame的列表......

indices <- c("indexA", "indexB", "indexC")

#  Pre-allocate your result vector
ll <- vector( mode = "list" , length = length(indices) )

for(i in 1:length(indices)){
  index <- indices[i]

  #  Read file into 'temporary'  object that will get overwritten in next loop iteration
  tmp <- read.csv( paste0( "data/", index , ".csv" ) , skip=8 )

  #  Do some processing on it
  tmp$newColumn <- tmp[ 1 , 'Column.D']

  #  Store result in list vector
  ll[[i]] <- tmp
}

如果要处理目录中的所有文件，也可以考虑list.files()，并且可以提供pattern参数regexp来限制文件的名称你选择，例如选择名为index*的所有文件，其中*是来自A的字母 - Z ...

fls <- list.files( path = "data" , pattern = "index[A-Z]" , full.names = TRUE )
for( i in fls ){
...
}

Answer 2

这是一个sapply的版本，可以节省您重建索引的时间。我首先定义执行工作的函数，然后使用sapply创建data.frames，最后，我将这些data.frames分配给indices中list2env定义的变量名称（请注意，sapply将返回三个数据框的列表，每个项目由indices命名，这就是为什么这样做的原因。）

indices <- c("indexA", "indexB", "indexC")

my_fun <- function(index) {   # this does the work
  df <- read.csv(paste0("data/", index, ".csv"), skip=8)
  transform(df, newColumn = Column.D)
}    
list2env(                     # assign to global env
  sapply(indices, my_fun, simplify=F),  # apply fun, returns list of 3 data frames
  envir=globalenv()
)

这将为您提供三个数据框，在您的全局环境中按indices命名：

ls()
# [1] "indexA"  "indexB"  "indexC"  "indices" "my_fun"

循环在多个数据集中创建和计算

2 个答案: