尝试在下一页快速访问MTA十字转门数据:
http://web.mta.info/developers/turnstile.html
我一直在计划循环遍历页码并运行fread或download.file来存储数据和绑定,但是我得到的一些文件和错误。这里有两个例子,一个有效,另一个没有。我注意到第二个文件看起来有点不同:
test_mta_works = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_161224.txt", sep = ',')
test_mta_wont_work = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", sep = ',')
错误我在第二个收到错误:
Error in fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", :
Expected sep (',') but new line, EOF (or other non printing character) ends field 12 when detecting types from point 0: A002,R051,02-00-00,04-18-14,16:00:00,REGULAR,004575433,001558298,04-18-14,20:00:00,REGULAR,004575838,001558374
任何想法可能是什么问题和/或如何解决这个问题?我尝试使用fill = T
,但它创建了数据问题。
谢谢!
当使用fill = T时,我得到如下输出:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1: A002 R051 02-00-00 04-12-14 00:00:00 REGULAR 4566812 1555499 04-12-14 04:00:00 REGULAR 4566850 1555508 04-12-14 08:00:00 REGULAR 4566875 1555536 04-12-14 12:00:00
2: A002 R051 02-00-00 04-13-14 08:00:00 REGULAR 4567968 1555789 04-13-14 12:00:00 REGULAR 4568069 1555842 04-13-14 16:00:00 REGULAR 4568278 1555903 04-13-14 20:00:00
3: A002 R051 02-00-00 04-14-14 16:00:00 REGULAR 4569148 1556362 04-14-14 20:00:00 REGULAR 4569786 1556420 04-15-14 00:00:00 REGULAR 4569949 1556447 04-15-14 04:00:00
4: A002 R051 02-00-00 04-16-14 00:00:00 REGULAR 4571423 1556965 04-16-14 04:00:00 REGULAR 4571442 1556966 04-16-14 08:00:00 REGULAR 4571486 1557049 04-16-14 12:00:00
5: A002 R051 02-00-00 04-17-14 08:00:00 REGULAR 4573294 1557587 04-17-14 12:00:00 REGULAR 4573469 1557848 04-17-14 16:00:00 REGULAR 4573800 1557901 04-17-14 20:00:00
6: A002 R051 02-00-00 04-18-14 16:00:00 REGULAR 4575433 1558298 04-18-14 20:00:00 REGULAR 4575838 1558374 NA NA
同时第一个不需要fill = T的文件给出了以下内容:
C/A UNIT SCP STATION LINENAME DIVISION DATE TIME DESC ENTRIES EXITS
1: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 03:00:00 REGULAR 5967477 2022101
2: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 07:00:00 REGULAR 5967485 2022116
3: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 11:00:00 REGULAR 5967553 2022233
4: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 15:00:00 REGULAR 5967790 2022331
5: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 19:00:00 REGULAR 5968186 2022421
答案 0 :(得分:2)
使用na.strings
作为fread
test_mta_wont_work = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", sep = ',', fill = TRUE, na.strings = "",NA)
head(test_mta_wont_work)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1: A002 R051 02-00-00 04-12-14 00:00:00 REGULAR 4566812 1555499 04-12-14 04:00:00 REGULAR
2: A002 R051 02-00-00 04-13-14 08:00:00 REGULAR 4567968 1555789 04-13-14 12:00:00 REGULAR
3: A002 R051 02-00-00 04-14-14 16:00:00 REGULAR 4569148 1556362 04-14-14 20:00:00 REGULAR
4: A002 R051 02-00-00 04-16-14 00:00:00 REGULAR 4571423 1556965 04-16-14 04:00:00 REGULAR
5: A002 R051 02-00-00 04-17-14 08:00:00 REGULAR 4573294 1557587 04-17-14 12:00:00 REGULAR
6: A002 R051 02-00-00 04-18-14 16:00:00 REGULAR 4575433 1558298 04-18-14 20:00:00 REGULAR
V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
1: 4566850 1555508 04-12-14 08:00:00 REGULAR 4566875 1555536 04-12-14 12:00:00 REGULAR 4567031
2: 4568069 1555842 04-13-14 16:00:00 REGULAR 4568278 1555903 04-13-14 20:00:00 REGULAR 4568507
3: 4569786 1556420 04-15-14 00:00:00 REGULAR 4569949 1556447 04-15-14 04:00:00 REGULAR 4569966
4: 4571442 1556966 04-16-14 08:00:00 REGULAR 4571486 1557049 04-16-14 12:00:00 REGULAR 4571666
5: 4573469 1557848 04-17-14 16:00:00 REGULAR 4573800 1557901 04-17-14 20:00:00 REGULAR 4574676
6: 4575838 1558374 NA NA NA NA NA NA NA NA NA
V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33
1: 1555629 04-12-14 16:00:00 REGULAR 4567347 1555694 04-12-14 20:00:00 REGULAR 4567736 1555738
2: 1555953 04-14-14 00:00:00 REGULAR 4568639 1555975 04-14-14 04:00:00 REGULAR 4568657 1555979
3: 1556449 04-15-14 08:00:00 REGULAR 4569998 1556529 04-15-14 12:00:00 REGULAR 4570176 1556774
4: 1557328 04-16-14 16:00:00 REGULAR 4572020 1557392 04-16-14 20:00:00 REGULAR 4572975 1557459
5: 1557989 04-18-14 00:00:00 REGULAR 4574912 1558020 04-18-14 04:00:00 REGULAR 4574943 1558020
6: NA NA NA NA NA NA NA NA NA NA NA
V34 V35 V36 V37 V38 V39 V40 V41 V42 V43
1: 04-13-14 00:00:00 REGULAR 4567914 1555770 04-13-14 04:00:00 REGULAR 4567952 1555773
2: 04-14-14 08:00:00 REGULAR 4568697 1556064 04-14-14 12:00:00 REGULAR 4568858 1556308
3: 04-15-14 16:00:00 REGULAR 4570437 1556855 04-15-14 20:00:00 REGULAR 4571260 1556938
4: 04-17-14 00:00:00 REGULAR 4573228 1557492 04-17-14 04:00:00 REGULAR 4573250 1557497
5: 04-18-14 08:00:00 REGULAR 4574977 1558080 04-18-14 12:00:00 REGULAR 4575130 1558233
6: NA NA NA NA NA NA NA NA NA NA