与EOF相关的错误?

时间:2016-12-28 15:22:39

标签: r import data.table fread

尝试在下一页快速访问MTA十字转门数据:

http://web.mta.info/developers/turnstile.html

我一直在计划循环遍历页码并运行fread或download.file来存储数据和绑定,但是我得到的一些文件和错误。这里有两个例子,一个有效,另一个没有。我注意到第二个文件看起来有点不同:

test_mta_works = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_161224.txt", sep = ',')

test_mta_wont_work = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", sep = ',')

错误我在第二个收到错误:

Error in fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt",  : 
  Expected sep (',') but new line, EOF (or other non printing character) ends field 12 when detecting types from point 0: A002,R051,02-00-00,04-18-14,16:00:00,REGULAR,004575433,001558298,04-18-14,20:00:00,REGULAR,004575838,001558374  

任何想法可能是什么问题和/或如何解决这个问题?我尝试使用fill = T,但它创建了数据问题。

谢谢!

修改

当使用fill = T时,我得到如下输出:

V1   V2       V3       V4       V5      V6      V7      V8       V9      V10     V11     V12     V13      V14      V15     V16     V17     V18      V19      V20
1: A002 R051 02-00-00 04-12-14 00:00:00 REGULAR 4566812 1555499 04-12-14 04:00:00 REGULAR 4566850 1555508 04-12-14 08:00:00 REGULAR 4566875 1555536 04-12-14 12:00:00
2: A002 R051 02-00-00 04-13-14 08:00:00 REGULAR 4567968 1555789 04-13-14 12:00:00 REGULAR 4568069 1555842 04-13-14 16:00:00 REGULAR 4568278 1555903 04-13-14 20:00:00
3: A002 R051 02-00-00 04-14-14 16:00:00 REGULAR 4569148 1556362 04-14-14 20:00:00 REGULAR 4569786 1556420 04-15-14 00:00:00 REGULAR 4569949 1556447 04-15-14 04:00:00
4: A002 R051 02-00-00 04-16-14 00:00:00 REGULAR 4571423 1556965 04-16-14 04:00:00 REGULAR 4571442 1556966 04-16-14 08:00:00 REGULAR 4571486 1557049 04-16-14 12:00:00
5: A002 R051 02-00-00 04-17-14 08:00:00 REGULAR 4573294 1557587 04-17-14 12:00:00 REGULAR 4573469 1557848 04-17-14 16:00:00 REGULAR 4573800 1557901 04-17-14 20:00:00
6: A002 R051 02-00-00 04-18-14 16:00:00 REGULAR 4575433 1558298 04-18-14 20:00:00 REGULAR 4575838 1558374                                NA      NA      

同时第一个不需要fill = T的文件给出了以下内容:

      C/A UNIT      SCP       STATION LINENAME DIVISION       DATE     TIME    DESC ENTRIES   EXITS
 1:  A002 R051 02-00-00         59 ST  NQR456W      BMT 12/17/2016 03:00:00 REGULAR 5967477 2022101
 2:  A002 R051 02-00-00         59 ST  NQR456W      BMT 12/17/2016 07:00:00 REGULAR 5967485 2022116
 3:  A002 R051 02-00-00         59 ST  NQR456W      BMT 12/17/2016 11:00:00 REGULAR 5967553 2022233
 4:  A002 R051 02-00-00         59 ST  NQR456W      BMT 12/17/2016 15:00:00 REGULAR 5967790 2022331
 5:  A002 R051 02-00-00         59 ST  NQR456W      BMT 12/17/2016 19:00:00 REGULAR 5968186 2022421           

1 个答案:

答案 0 :(得分:2)

使用na.strings作为fread

的参数
test_mta_wont_work = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", sep = ',', fill = TRUE, na.strings = "",NA)

head(test_mta_wont_work)

     V1   V2       V3       V4       V5      V6      V7      V8       V9      V10     V11
1: A002 R051 02-00-00 04-12-14 00:00:00 REGULAR 4566812 1555499 04-12-14 04:00:00 REGULAR
2: A002 R051 02-00-00 04-13-14 08:00:00 REGULAR 4567968 1555789 04-13-14 12:00:00 REGULAR
3: A002 R051 02-00-00 04-14-14 16:00:00 REGULAR 4569148 1556362 04-14-14 20:00:00 REGULAR
4: A002 R051 02-00-00 04-16-14 00:00:00 REGULAR 4571423 1556965 04-16-14 04:00:00 REGULAR
5: A002 R051 02-00-00 04-17-14 08:00:00 REGULAR 4573294 1557587 04-17-14 12:00:00 REGULAR
6: A002 R051 02-00-00 04-18-14 16:00:00 REGULAR 4575433 1558298 04-18-14 20:00:00 REGULAR
       V12     V13      V14      V15     V16     V17     V18      V19      V20     V21     V22
1: 4566850 1555508 04-12-14 08:00:00 REGULAR 4566875 1555536 04-12-14 12:00:00 REGULAR 4567031
2: 4568069 1555842 04-13-14 16:00:00 REGULAR 4568278 1555903 04-13-14 20:00:00 REGULAR 4568507
3: 4569786 1556420 04-15-14 00:00:00 REGULAR 4569949 1556447 04-15-14 04:00:00 REGULAR 4569966
4: 4571442 1556966 04-16-14 08:00:00 REGULAR 4571486 1557049 04-16-14 12:00:00 REGULAR 4571666
5: 4573469 1557848 04-17-14 16:00:00 REGULAR 4573800 1557901 04-17-14 20:00:00 REGULAR 4574676
6: 4575838 1558374       NA       NA      NA      NA      NA       NA       NA      NA      NA
       V23      V24      V25     V26     V27     V28      V29      V30     V31     V32     V33
1: 1555629 04-12-14 16:00:00 REGULAR 4567347 1555694 04-12-14 20:00:00 REGULAR 4567736 1555738
2: 1555953 04-14-14 00:00:00 REGULAR 4568639 1555975 04-14-14 04:00:00 REGULAR 4568657 1555979
3: 1556449 04-15-14 08:00:00 REGULAR 4569998 1556529 04-15-14 12:00:00 REGULAR 4570176 1556774
4: 1557328 04-16-14 16:00:00 REGULAR 4572020 1557392 04-16-14 20:00:00 REGULAR 4572975 1557459
5: 1557989 04-18-14 00:00:00 REGULAR 4574912 1558020 04-18-14 04:00:00 REGULAR 4574943 1558020
6:      NA       NA       NA      NA      NA      NA       NA       NA      NA      NA      NA
        V34      V35     V36     V37     V38      V39      V40     V41     V42     V43
1: 04-13-14 00:00:00 REGULAR 4567914 1555770 04-13-14 04:00:00 REGULAR 4567952 1555773
2: 04-14-14 08:00:00 REGULAR 4568697 1556064 04-14-14 12:00:00 REGULAR 4568858 1556308
3: 04-15-14 16:00:00 REGULAR 4570437 1556855 04-15-14 20:00:00 REGULAR 4571260 1556938
4: 04-17-14 00:00:00 REGULAR 4573228 1557492 04-17-14 04:00:00 REGULAR 4573250 1557497
5: 04-18-14 08:00:00 REGULAR 4574977 1558080 04-18-14 12:00:00 REGULAR 4575130 1558233
6:       NA       NA      NA      NA      NA       NA       NA      NA      NA      NA