我一直在使用surveygizmo,这是一个非常强大的在线调查问卷主持人。数据可以导出为csv文件,但它有两个[不是一个]标题行。第一行指定问题,第二行包含响应者可能已检查的可能响应。这在数据读写世界中看起来非常准确,但在调查领域似乎很正常。如何将这样的文件读入R?
Surveygizmo曾经有一个“旧的”导出格式将所有内容放在一行,但我最近遇到的问题是网站不会导出它。 Surveygizmo对“旧”格式并不感兴趣,因为它是2代以前,他们不想支持它。
在一个简单的调查中,一位帮助我的实习生能够通过以下代码解决问题
#Read csv file with two rows of headers
#Append the second row to the first row
df <-read.csv(csvfile,skip=1,stringsAsFactors=FALSE) #Read csv without any header
hl=readLines(csvfile, 2) #Read the two header lines as char strings
hl=strsplit(hl,',') #Split headers up by commas
colnames(df)=sub('_$','',paste(hl[[1]],hl[[2]],sep="")) #join second row to first row
但是,如果调查时间越长,问题越多,问题越长(因此标题越长),我们的暴力方法就无效了。
最后,我想要一个带有列标题的数据框,然后我将与来自后续调查的另一个数据框合并。任何处理此问题的在线参考资料?
以下是具有两个标题行的csv文件的示例。第3行和最后一行是第一行数据。我已经改变了与私人健康信息相关的所有内容。标题非常长,因为调查小控件使用整个问题作为标题。
"","","","","","","","","","Inclusion Criteria I or my child is a patient with recurrent respiratory papillomatosis (RRP)How do you know that you or your child has RRP? Please check whatever is true.","","","Exclusion Criteria Do any of the following apply? Please put a check next to any condition that is present.In the unlikely event that one of the following conditions apply, then unfortunately we cannot enroll you in this study. You could stop or you could carry on telling us about yourself, whichever you prefer. ","","Confused or have questions?If you are confused about any items or if you want us to clarify something then here is the place that you can express yourself freely. Also, you can call us at (412) 567-7870 or at (888) 887-7729.You are encouraged to review the consent form. You do not have to sign it now but you will need to do so once we enroll you. ","Please tell us who you are - referring to you, the person completing the form. Different people feel differently about their privacy and about how they are contacted. We will do our utmost to protect your privacy. Please do not give us your e-mail address if you do not want us to use it. Remember that e-mail should be private but is not always so. The safest way to think about it is as if e-mail was similar to a post card. Please do not give us a telephone number you do not want us to contact you on.","","","","","","","","","","","Who are you? Are you the patient or a parent or someone else?","When was the person with RRP born?Enter the date as MM/DD/YYYY","Approximately when was RRP diagnosed? This can be very approximate. If you do not remember the date then please put down your best guess. We will use it to work out how old the patient was when he or she was diagnosed. Enter the date as MM/DD/YYYY.","Has the patient with RRP ever received Gardasil? Gardasil is a vaccine against HPV 6, 11, 16 and 18 that was approved by the Food and Drug Administration (FDA) for use in females to prevent gynecologic diseases. ","Please ignore this question. It is for our internal tracking. Are you?","gender","race","Has there been human contact? By e-mail or by telephone or by anything in which we discussed informed consent","What is the subject number?","Merck Research Laboratory Accession Number?","Second Merck Accession Number?","FedEx Tracking Number","Date Shipped Out","Date EMSI Notified"
"Response ID","RespondantKey","Edit Link","IP","Date Started","Date Finished","Status","Linked From","Comments","histopathconfirm","surgeonseaid","other","cancer","none","","First Name","Last Name","Street Address","Apt/Suite/Office","City","State","Postal Code","Country","Email Address","Phone Number","Mobile Phone","","","","","","","","","","","","","",""
"6990181","4099941","http://s-gtzd7-14166.sgizmo.com/?edit=6770181&cc=e246ecb7095b983xxxxx7ec0a9","1991.157.178.134","2009-04-30 07:57:24","2009-04-15 14:56:01","Submitted","","Spoke to her Thursday, 20 Apr 2009 20:26. No questions ready to go.09/11/2009 consent mailed..mrs accession number 304074333811wp, 01wp SFJB06123 Fedex tracking 865888887357 sent Tues April 29; called her Thurs, 10 May 2009 20:21 she will sign slip","histopathconfirm","surgeonseaid","","","none","","Jane","Doe","23 Hastings Rd","29th floor","Oranje","ny","27935","USA","mystry@gmail.com","728-850-7252","626-922-2239","Patient","02/21/1965","01/01/1976","No","Key Person","","","Yes","SFJB06123","304033385811wp","303334485801wp","865333807357","4/11/2007","4/11/2007"
答案 0 :(得分:5)
为什么不在第一个标题行中读取 read.csv (这是我理解你的问题的实际标题)然后跳过下一行:
read.csv(file, header=T, skip=1)
或者,如果第二个标题行以特殊字符开头(在数据中找不到),则可以通过将以行开头的字符作为值传递给参数来将该行指定为注释行 comment.char (例如,如果那行以“#”开头,那就是)::
read.csv(file, header=T, comment.char="#")
答案 1 :(得分:0)
实际上我认为最简单的方法是使用SurveyGizmo SPSS导出。将数据导出到SPSS,然后在R中使用类似的命令:
read.spss(文件= 'mydata.sav')
这实际上应该对你有用,并将所有数据描述也带到R中。
我很抱歉你昨天打来的电话很乱。确实,我们正努力不回去修复旧的,旧的旧CSV导出。但是,并不是我们不想支持单头导出。在版本3中,现在处于活动状态的“快速导出”是单列导出,并将保持这种状态 - 官方支持。
可悲的是,这个导出的旧版本太过于删除而且没有优化,无法升级它以与一些更现代的浏览器一起使用。特别是因为新的出口只有几周的时间才能接管。如果电话不清楚或不专业,我很抱歉。
在这种情况下,SPSS Export是要走的路!如果您没有为您提供导出选项的计划级别,只需发送电子邮件(或致电)支持并将其指向此帖子即可。他们会将导出添加到您的帐户。
干杯,
-Christian Vanek
CTO&amp;联合创始人
SurveyGizmo