read.csv.sql过滤包含列的字段

时间:2014-03-07 18:42:12

标签: r sqldf

我无法使用this questionsqldf FAQ's.

中的答案解决此问题
LOC_NAME,BIRTH_DTTM,MOM_PAT_MRN_ID,EMPI,MOM_PAT_NAME,MOM_HOSP_ADMSN_TIME,MOM_HOSP_DISCH_TIME,DEL_PROV_NAME,ATTND_PROV_NAME,DELIVERY_TYPE,PRIM.REPT,COUNT_OF_BABIES,CHILD_PED_GEST_AGE_NUM,REASON_FOR_DEL,REASON_DEL_COM,INDUCT_METHOD,INDUCT_COM,AUGMENTATION
HOSPITAL,1/1/2000 10:00,abc,Eabc,"Surname1, Given1",1/1/2000 10:00,1/3/2000 10:00,"Doctor, First","Doctor, First","C-Section, Low Transverse",Repeat,1,38,,,1) None,,1) None
HOSPITAL,1/2/2000 11:00,def,Edef,"Surname2, Given2",1/2/2000 11:00,1/5/2000 11:00,"Doctor2, First2","Doctor2, First2","C-Section, Low Transverse",Primary,1,36,Ruptured Membranes;Labor;Other (see comment),"PPROM, Preterm labor",1) None,,1) None
HOSPITAL,1/3/2000 12:00,ghi,Eghi,"Surname3, Given3",1/3/2000 12:00,1/6/2000 12:00,"Doctor3, First3","Doctor3, First3","C-Section, Low Transverse",Repeat,1,31,Other (see comment),,1) None,,1) None
HOSPITAL,1/4/2000 13:00,jkl,Ejkl,"Surname4, Given4",1/4/2000 13:00,1/7/2000 13:00,,"Doctor4, First4","Vaginal, Spontaneous Delivery",,1,28,Other (see comment),Fetal anomaly,1) oxytocin (Pitocin),,

为了读入数据,我尝试过:

read.csv.sql(file) 

read.csv.sql(file, filter = 'tr.exe -d ^" ')

read.csv.sql(file, filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))

read.csv.sql(file, 
             filter = "perl -e 's{(\"[^\",]+),([^\"]+\")}{$_= $&, s/,/_/g, $_}eg'")

我在R 3.0.0中使用R Studio Server在Ubuntu OS上工作。

不幸的是,更改分隔符不是一个选项(也不会对我需要查询的某些文件非常有效。我的一些文件是病理报告,所以无论我使用什么分隔符,我都会去遇到这个问题。

任何关于我缺少什么的提示都可以阅读?

1 个答案:

答案 0 :(得分:1)

在sqldf csvfix中尝试FAQ #13,但请使用write_dsv's默认值|符号而不是;因为你的文件中有分号:

read.csv.sql("myfile.csv", sep = "|", filter = "csvfix write_dsv")