当read.csv()工作时,fread()失败:说EOL结束一个字段?

时间:2016-04-26 19:47:31

标签: r data.table

我下载此数据集时遇到问题:  http://data.insideairbnb.com/france/ile-de-france/paris/2015-09-02/data/listings.csv.gz

然后跑

library(data.table)
data.table 1.9.6  For help type ?data.table or https://github.com/Rdatatable/data.table/wiki
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
d3 = fread("~/Downloads/listings-2.csv",verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.120920 GB.
Memory mapping ... ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 92 columns. Longest stretch was from line 1 to line 30
Starting data input on line 1 (either column names or first row of data). First 10 characters: id,listing
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 87973 (including 1 at the end)
Count of sep: 4879857
nrow = MIN( nsep [4879857] / ncol [92] -1, neol [87973] - nblank [1] ) = 53624
Type codes (   first 5 rows): 14244444444444441444444444444114444440441444433444131144144444141144111141441111111404444413
Type codes (+ middle 5 rows): 14244444444444441444444444444114444440441444433444131144144444141144111141441111111404444413
Type codes (+   last 5 rows): 14244444444444441444444444444114444440441444433444131144144444141144111141441111111404444413
Type codes: 14244444444444441444444444444114444440441444433444131144144444141144111141441111111404444413 (after applying colClasses and integer64)
Type codes: 14244444444444441444444444444114444440441444433444131144144444141144111141441111111404444413 (after applying drop or select (if     supplied)
Allocating 92 column slots (92 - 0 dropped)
Bumping column 41 from INT to INT64 on data row 299, field contains '""'
Bumping column 41 from INT64 to REAL on data row 299, field contains '""'
Bumping column 41 from REAL to STR on data row 299, field contains '""'
Bumping column 85 from LGL to INT on data row 16461, field contains '0143676020'
Bumping column 17 from INT to INT64 on data row 21785, field contains '""Safety Card""'
Bumping column 17 from INT64 to REAL on data row 21785, field contains '""Safety Card""'
Bumping column 17 from REAL to STR on data row 21785, field contains '""Safety Card""'
Bumping column 30 from INT to INT64 on data row 21785, field contains 't'
Bumping column 30 from INT64 to REAL on data row 21785, field contains 't'
Bumping column 30 from REAL to STR on data row 21785, field contains 't'
Bumping column 38 from LGL to INT on data row 21785, field contains '2015-07-17'
Bumping column 38 from INT to INT64 on data row 21785, field contains '2015-07-17'
Bumping column 38 from INT64 to REAL on data row 21785, field contains '2015-07-17'
Bumping column 38 from REAL to STR on data row 21785, field contains '2015-07-17'
Bumping column 46 from REAL to STR on data row 21785, field contains 'f'
Bumping column 51 from INT to INT64 on data row 21785, field contains 'f'
Bumping column 51 from INT64 to REAL on data row 21785, field contains 'f'
Bumping column 51 from REAL to STR on data row 21785, field contains 'f'
Bumping column 52 from REAL to STR on data row 21785, field contains 'f'
Bumping column 54 from INT to INT64 on data row 21785, field contains '1.99'
Bumping column 54 from INT64 to REAL on data row 21785, field contains '1.99'
Error in fread("~/Downloads/listings-2.csv", verbose = TRUE) : 
  Expected sep (',') but new line or EOF ends field 54 on line 21786 when reading data: 4916075,https://www.airbnb.com/rooms/4916075,20150902193246,2015-09-03,QUARTIER LATIN-SAINT GERMAIN,"Studio de 25 mètres carré, situé au 3éme étage avec ascenseur dans un joli bâtiment, idéal pour un couple de voyageurs avec un petit budget. ","Dans la pièce principale, il y a un lit sofa pour 2 personnes. Cuisine équipée rénové et décoré en 2012 avec accès illimité à Internet.  L'apartement se situe dans un inmueble avec ascenseur, vous allez avoir à votre disposition linge et serviettes propre.  L'appartement à Paris est situé rue du Cardinal Lemoine. L'appartement est très proche de la rue Mouffetard, une des rues les plus pittoresques de Paris et située au coeur même du Quartier Latin. Le studio à louer est proche de La Sorbonne, Le Panthéon, il y a des restaurants et des cafés, des librairies, des cinémas, des théâtres et des marchés en plein air avec tout 

这显然是一个混乱的数据集。然而,read.csv确实有效:d2 = read.csv("~/Downloads/listings-2.csv")并且第一眼看上去还不错。任何想法?

0 个答案:

没有答案