Question

使用readr包读取大文件（接近2000000行）时遇到问题。

为什么我要使用readr包。我的数据文件可以包含停止执行readr的ASCII控制字符（0x01等于ascii 26等于CTRL + Z），我注意到fill=TRUE包对此问题不敏感。

我的文件有不同的行长，因此如果可以使用read.table()，我会使用read_table。

我尝试使用readr包的read_delim但没有成功，因为它似乎没有找到空格作为列分隔符。

我尝试使用read_delim(file,delim=" ")。使用代码ComponentModel.DataAnnotations。找到了分隔符，但第一行被认为是我的数据框的主要长度，因此缩短了行的截断。

有人有意见吗？

Answer 1

我使用以下代码成功地将我的数据（从名为file的文件）收集到数据框（rtcm1）中：

 #create a vector for named the columns, actually I used more for define the number of columns to be used to import my file

 col<-paste("V",1:17,sep="")

#use read_delim of the readr packages with a separator is whitespace. I don't really know why but I need to put quote="" to collect all my datas. maybe to not consider "" as quoting characters.

 rtcm1<-read_delim(file,delim=" ",col_names=col,quote="")

使用这样的解决方案，NA的填充单元没有数据和警告由函数给出，但它似乎运行良好。

使用readr导入具有不同行长度和空格作为分隔符的大数据文件

1 个答案: