Question

我正在尝试读取6 GB以上的csv文件来进行一些聚合。我正在使用以下方法：

read.table('csv_file',sep=",", head=T, stringsAsFactors=F)
read.csv("csv_file",as.is=T,header=F,quote="")

然而，无论采用何种方法，我都会遇到如下错误：

警告讯息：

In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
EOF within quoted string

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
more columns than column names

我见过很多人都提出了类似的错误，但到目前为止，这些建议都没有。

欣赏是否有人可以对此有所了解。提前致谢。

Answer 1

要获得洞察力，请使用构造：

table( count.fields("~/Downloads/test1.txt", sep=",", quote="", comment.char="", skip=0) )

如果只有几行有奇怪之处，那么你可以用不同的'skip'值缩小它。我没有在文件上使用这么大，但已经使用了一半大小的文件。 count.fields结果还可用于标识字段编号中具有特定差异的行号。如果你得到的东西表明10行的一行比预期的列数少20，那就这样做：

which( 
  count.fields("~/Downloads/test1.txt", sep=",", quote="", comment.char="", skip=0) == 9)

Answer 2

标题可能不在第1行上，您需要跳过特定数量的行才能到达标题

在R中读取大型CSV文件时出错

2 个答案: