如何使用R将大型CSV数据文件分成单个数据文件?

时间:2010-07-31 02:25:24

标签: r csv

我有一个CSV文件,其第一行包含变量名称,其余行包含数据。将它分解为每个只包含一个R变量的文件的好方法是什么?这个解决方案是否会变得强大?例如。如果输入文件大小为100G怎么办?

输入文件类似于

var1,var2,var3
1,2,hello
2,5,yay
...

我想创建3个(或多个变量)文件var1.csv,var2.csv,var3.csv 所以文件类似 的 File1中

var1
1
2
...

文件2

var2?
2
5
...

文件3

var3
hello
yay

我在Python(How to break a large CSV data file into individual data files?)中得到了一个解决方案,但我想知道R是否可以做同样的事情? Python代码必不可少的是逐行读取csv文件,然后一次写出一行。 R可以这样做吗? read.csv命令一次读取整个文件,这可以减慢整个过程。另外,当R尝试将整个文件读入内存时,它无法读取100G文件并对其进行处理。我在R中找不到一个命令,让你逐行读取csv文件。请帮忙。谢谢!

1 个答案:

答案 0 :(得分:6)

您可以scan然后write一次一行文件。

i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
       length(x) > 1}) {
  write(x[1], "file1.csv", sep = ",", append = T)
  write(x[2], "file2.csv", sep = ",", append = T)
  write(x[3], "file3.csv", sep = ",", append = T)
  i <- i + 1
}

修改!!我使用上面的数据,复制了1000多次。当我们始终打开文件连接时,我已经对速度进行了比较。

ver1 <- function() {
  i <- 0
  while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
         length(x) > 1}) {
    write(x[1], "file1.csv", sep = ",", append = T)
    write(x[2], "file2.csv", sep = ",", append = T)
    write(x[3], "file3.csv", sep = ",", append = T)
    i <- i + 1
  }
}

system.time(ver1()) # w/ close to 3K lines of data, 3 columns
##    user  system elapsed 
##   2.809   0.417   3.629 

ver2 <- function() {
  f <- file("file.csv", "r")
  f1 <- file("file1.csv", "w")
  f2 <- file("file2.csv", "w")
  f3 <- file("file3.csv", "w")
  while({x <- scan(f, sep = ",", skip = 0, nlines = 1, what = "character");
         length(x) > 1}) {
    write(x[1], file = f1, sep = ",", append = T, ncol = 1)
    write(x[2], file = f2, sep = ",", append = T, ncol = 1)
    write(x[3], file = f3, sep = ",", append = T, ncol = 1)
  } 
  closeAllConnections()
}

system.time(ver2())
##   user  system elapsed 
##   0.257   0.098   0.409