为什么fread不接受skip命令?

时间:2017-06-16 19:18:35

标签: r data.table fread

我有一个.txt数据集,其中前12行是文本,后跟2个空白行,然后是数据

DATE           HEIGHT    INPUT     OUTPUT  TESTMEASURE
01/01/1933  NO RECORD   NO RECORD   MISSING     MISSING
01/02/1933  NO RECORD   NO RECORD   MISSING     MISSING

但是当我做了

dat <- fread('data.txt'),

它跳过15行,并使用第一个数据行作为导入数据集的列名。它忽略了标题行。

01/01/1933  NO RECORD   NO RECORD   MISSING     MISSING

skip参数不会影响我导入的内容。如何提及需要用作列名的行号。或者,我可以重命名列名,但不应忽略第一行数据。

诊断

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.001319 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... '\t'
Detected 5 columns. Longest stretch was from line 15 to line 30
Starting data input on line 15 (either column names or first row of data). First 10 characters: 01/01/1933
The line before starting line 15 is non-empty and will be ignored (it has too few or too many items to be column names or data): DATE           HEIGHT    INPUT    OUTPUT  TESTMEASURE the fields on line 15 are character fields. Treating as the column names.

1 个答案:

答案 0 :(得分:2)

您有12行文字,2行空格,然后是您的数据。但我注意到DATEHEIGHT之间有额外的空格。因此,制作一个这样的文本文件,您的数据以制表符分隔,并在DATEHEIGHT之间添加 2个标签,而不是 1个标签

garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage


DATE        HEIGHT  INPUT   OUTPUT  TESTMEASURE
01/01/1933  NO RECORD   NO RECORD   MISSING MISSING
01/02/1933  NO RECORD   NO RECORD   MISSING MISSING

fread(data)给了我:

fread(data)
   01/01/1933 NO RECORD NO RECORD MISSING MISSING
1: 01/02/1933 NO RECORD NO RECORD MISSING MISSING

删除DATEHEIGHT之间的额外标签会给我:

         DATE    HEIGHT     INPUT  OUTPUT TESTMEASURE
1: 01/01/1933 NO RECORD NO RECORD MISSING     MISSING
2: 01/02/1933 NO RECORD NO RECORD MISSING     MISSING