readr :: read_csv()-使用嵌套引号解析失败

时间:2018-12-04 16:58:34

标签: r readr

我有一个csv,其中某些列的引号内有另一个引号:

"blah blah "nested quote"",它会产生解析失败。我不确定这是错误还是要解决的问题?

代表(文件为here或下面粘贴的内容):

readr::read_csv("~/temp/shittyquotes.csv")
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   INSTNM = col_character(),
#>   ADDR = col_character(),
#>   CITY = col_character(),
#>   STABBR = col_character(),
#>   ZIP = col_character(),
#>   CHFNM = col_character(),
#>   CHFTITLE = col_character(),
#>   EIN = col_character(),
#>   OPEID = col_character(),
#>   WEBADDR = col_character(),
#>   ADMINURL = col_character(),
#>   FAIDURL = col_character(),
#>   APPLURL = col_character(),
#>   ACT = col_character(),
#>   IALIAS = col_character(),
#>   INSTCAT = col_character(),
#>   CCBASIC = col_character(),
#>   CCIPUG = col_character(),
#>   CCSIZSET = col_character(),
#>   CARNEGIE = col_character()
#>   # ... with 2 more columns
#> )
#> See spec(...) for full column specifications.
#> Warning: 3 parsing failures.
#> row    col           expected      actual                      file
#>   2 IALIAS delimiter or quote C           '~/temp/shittyquotes.csv'
#>   2 IALIAS delimiter or quote D           '~/temp/shittyquotes.csv'
#>   2 NA     59 columns         100 columns '~/temp/shittyquotes.csv'
#> # A tibble: 2 x 59
#>   UNITID INSTNM ADDR  CITY  STABBR ZIP    FIPS OBEREG CHFNM CHFTITLE
#>    <dbl> <chr>  <chr> <chr> <chr>  <chr> <dbl>  <dbl> <chr> <chr>   
#> 1 441238 City … 1500… Duar… CA     9101…     6      8 Dr. … Director
#> 2 441247 Commu… 3800… Mode… CA     9535…     6      8 Vict… Preside…
#> # ... with 49 more variables: GENTELE <dbl>, EIN <chr>, OPEID <chr>,
#> #   OPEFLAG <dbl>, WEBADDR <chr>, ADMINURL <chr>, FAIDURL <chr>,
#> #   APPLURL <chr>, SECTOR <dbl>, ICLEVEL <dbl>, CONTROL <dbl>,
#> #   HLOFFER <dbl>, UGOFFER <dbl>, GROFFER <dbl>, FPOFFER <dbl>,
#> #   HDEGOFFR <dbl>, DEGGRANT <dbl>, HBCU <dbl>, HOSPITAL <dbl>,
#> #   MEDICAL <dbl>, TRIBAL <dbl>, LOCALE <dbl>, OPENPUBL <dbl>, ACT <chr>,
#> #   NEWID <dbl>, DEATHYR <dbl>, CLOSEDAT <dbl>, CYACTIVE <dbl>,
#> #   POSTSEC <dbl>, PSEFLAG <dbl>, PSET4FLG <dbl>, RPTMTH <dbl>,
#> #   IALIAS <chr>, INSTCAT <chr>, CCBASIC <chr>, CCIPUG <chr>,
#> #   CCIPGRAD <dbl>, CCUGPROF <dbl>, CCENRPRF <dbl>, CCSIZSET <chr>,
#> #   CARNEGIE <chr>, TENURSYS <dbl>, LANDGRNT <dbl>, INSTSIZE <chr>,
#> #   CBSA <dbl>, CBSATYPE <chr>, CSA <dbl>, NECTA <dbl>, DFRCGID <dbl>

reprex package(v0.2.1)于2018-12-04创建

这也是csv内容:

UNITID,INSTNM,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,CHFTITLE,GENTELE,EIN,OPEID,OPEFLAG,WEBADDR,ADMINURL,FAIDURL,APPLURL,SECTOR,ICLEVEL,CONTROL,HLOFFER,UGOFFER,GROFFER,FPOFFER,HDEGOFFR,DEGGRANT,HBCU,HOSPITAL,MEDICAL,TRIBAL,LOCALE,OPENPUBL,ACT,NEWID,DEATHYR,CLOSEDAT,CYACTIVE,POSTSEC,PSEFLAG,PSET4FLG,RPTMTH,IALIAS,INSTCAT,CCBASIC,CCIPUG,CCIPGRAD,CCUGPROF,CCENRPRF,CCSIZSET,CARNEGIE,TENURSYS,LANDGRNT,INSTSIZE,CBSA,CBSATYPE,CSA,NECTA,DFRCGID 
441238,"City of Hope Graduate School of Biological Science","1500 E Duarte Rd","Duarte","CA","91010-3000", 6, 8,"Dr. Arthur Riggs","Director","6263018293","953432210","03592400",1,"gradschool.coh.org"," "," "," ",2,1,2,9,2,1,2,10,1,2,-2,2,2,21,1,"A ",-2,-2,"-2",1,1,1,1,1," ",1,25,-2,-2,-2,7,-2,-3,1,2,1,31100,1,348,-2,198
441247,"Community Business College","3800 McHenry Ave Suite M","Modesto","CA","95356-1569", 6, 8,"Victor L. Vandenberghe","President","2095293648","484-8230","03615300",7,"www.communitybusinesscollege.edu","www.communitybusinesscollege.edu","www.cbc123.com","www.123.com",9,3,3,1,1,2,2,0,2,2,-2,2,2,12,1,"A ",-2,-2,"-2",1,1,1,1,2,"formerly "Community Business School"",6,-3,-3,-3,-3,-3,-3,-3,2,2,1,33700,1,-2,-2,71
441256,"Design's School of Cosmetology","715 24th St Ste E","Paso Robles","CA","93446", 6, 8,"Sharon Skinner","Administrator","8052378575","80002030","03646300",1,"designsschool.com"," "," "," ",9,3,3,2,1,2,2,0,2,2,-2,2,2,13,1,"A ",-2,-2,"-2",1,1,1,1,2," ",6,-3,-3,-3,-3,-3,-3,-3,2,2,1,42020,1,-2,-2,46

2 个答案:

答案 0 :(得分:2)

Jim Hester提供了以下答案:

您需要对escape_double = FALSE使用read_delim()参数。这不是read_csv()的一部分,因为excel样式的csv通过将内引号加倍来对其进行转义。

答案 1 :(得分:2)

data.table的{​​{1}}可以很好地解析文件...它会引发有关引号的警告,但是您可以忽略它。.

fread()
  

警告信息:   在data.table :: fread(“ ./ temp.csv”)中:     在前100行中发现并解决了不正确的报价。如果未用引号将字段引起引用(例如,字段分隔符未出现在任何字段中),请尝试使用quote =“”以避免出现此警告。