Question

我对R很新，但是做节目。我可能只是在这个阶段厌倦了自己的进步，所以这就是我的问题;

很多.csv文件，大（6MB）带有频谱数据，我需要在之后进行分析。我试图读入数据 - 两列频率和电压（V为dB值），每个文件500,000个数据点。我想＆＃34;合并＆＃34;每10个文件的新数据集中第2列的数据。

例如：10个文件，10个频率（对于每个文件都是相同的，因此暂时可以忽略）和10个电压。从第二列中的 Voltage 获取数据并将其合并到一个数据集中。如果我有10个文件=我最终得到一个数据集，100个文件= 10个数据集。希望最后每个数据集都有11列 |频率| V1 | V2 | ...... | V10 | 。在每个文件上进行索引匹配会很不错，但是在我升级资源之前，我不确定我的PC是否可以使用它。

这可能看起来很安静，所有建议都很受欢迎，在尝试对1200个.csv文件进行排序甚至只读100个文件时，内存似乎是一个问题。谢谢你的时间！

Answer 1

我还没有对此进行过测试，因为我显然没有您的数据，但下面的代码应该有效。基本上，您创建所有文件名的向量，然后一次读取，组合和写入其中的10个。

library(reshape2)
library(dplyr)

# Get the names of all the csv files
files = list.files(pattern="csv$")

# Read, combine, and save ten files at a time in each iteration of the loop
for (i in (unique(1:length(files)) - 1) %/% 10)) {

  # Read ten files at a time into a list
  dat = lapply(files[(1:length(files) - 1) %/% 10 == i], function(f) {
    d=read.csv(f, header=TRUE, stringsAsFactors=FALSE)
    # Add file name as a column
    d$file = gsub("(.*)\\.csv$", "\\1", f)
    return(d)
  })

  # Combine the ten files into a single data frame
  dat = bind_rows(dat)

  # Reshape from long to wide format
  dat = dcast(Frequency ~ file, value.var="Voltage")

  # Write to csv
  write.csv(dat, paste("Files_", i,".csv"), row.names=FALSE)
}

另一方面，如果你想将它们全部组合成长格式的单个文件，这将使分析更容易（如果你有足够的内存）：

  # Read all files into a list
  dat = lapply(files, function(f) {
    d = read.csv(f, header=TRUE, stringsAsFactors=FALSE)
    # Add file name as a column
    d$file = gsub("(.*)\\.csv$", "\\1", f)
    return(d)
  })

  # Combine into a single data frame
  dat = bind_rows(dat)

  # Save to csv
  write.csv(dat, "All_files_combined.csv", row.names=FALSE)

阅读，合并＆amp;排序.csv文件

1 个答案: