我有一个CSV文件,其第一行包含变量名称,其余行包含数据。将它分解为每个只包含一个R变量的文件的好方法是什么?这个解决方案是否会变得强大?例如。如果输入文件大小为100G怎么办?
输入文件类似于
var1,var2,var3
1,2,hello
2,5,yay
...
我想创建3个(或多个变量)文件var1.csv,var2.csv,var3.csv 所以文件类似 的 File1中
var1
1
2
...
文件2
var2?
2
5
...
文件3
var3
hello
yay
我在Python(How to break a large CSV data file into individual data files?)中得到了一个解决方案,但我想知道R是否可以做同样的事情? Python代码必不可少的是逐行读取csv文件,然后一次写出一行。 R可以这样做吗? read.csv命令一次读取整个文件,这可以减慢整个过程。另外,当R尝试将整个文件读入内存时,它无法读取100G文件并对其进行处理。我在R中找不到一个命令,让你逐行读取csv文件。请帮忙。谢谢!
答案 0 :(得分:6)
您可以scan
然后write
一次一行文件。
i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}
修改!!我使用上面的数据,复制了1000多次。当我们始终打开文件连接时,我已经对速度进行了比较。
ver1 <- function() {
i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}
}
system.time(ver1()) # w/ close to 3K lines of data, 3 columns
## user system elapsed
## 2.809 0.417 3.629
ver2 <- function() {
f <- file("file.csv", "r")
f1 <- file("file1.csv", "w")
f2 <- file("file2.csv", "w")
f3 <- file("file3.csv", "w")
while({x <- scan(f, sep = ",", skip = 0, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], file = f1, sep = ",", append = T, ncol = 1)
write(x[2], file = f2, sep = ",", append = T, ncol = 1)
write(x[3], file = f3, sep = ",", append = T, ncol = 1)
}
closeAllConnections()
}
system.time(ver2())
## user system elapsed
## 0.257 0.098 0.409