如何阅读.csv文件包含R中的双引号(用于删除)

时间:2018-02-20 07:24:55

标签: r double-quotes read.csv

我打算在读取.csv文件包含数据集中的双引号(例如德国信用数据集)时遇到问题。我想知道是否有任何有效的方法来从read函数中删除指定参数中的双引号。我已经尝试了几种方式,并没有得出我想要的结论。请帮我解决这个问题。谢谢。

  

[原始德国信用.csv数据集给出] [1]

     

[1]:https://i.stack.imgur.com/130cj.png

然后我已尝试以下代码,但结果如下

GermanCredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = FALSE, header = TRUE, sep = "," , quote = "")

结果如下:

  

[read.csv with quote argument] [1]

     

[1]:https://i.stack.imgur.com/uwncH.png

然后我避免指定引用参数

germancredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = FALSE, header = TRUE, sep = ",")

产生以下结果:

  

[read.csv without quote argument] [1]

     

[1]:https://i.stack.imgur.com/Oebu0.png

第三次我尝试使用read.table函数,如下所示

German_Credit <- read.table("D:/R Statistics/GermanCredit/germancredit.csv", quote = NULL, header = TRUE, sep = ",")

与第一个没有区别。我也使用了readr包中的fread函数,结果没有什么不同。任何人都可以告诉我在阅读csv文件时有效删除引用的方法。非常感谢你。

dput(readLines("D:/R Statistics/GermanCredit/germancredit.csv", n = 10))
  

C(&#34; \&#34; \&#34; \&#34;状态\&#34; \&#34; \&#34; \&#34;持续时间\&# 34; \&#34; \&#34; \&#34; credit_history \&#34; \&#34; \&#34; \&#34;目的\&#34; \&#34 ; \&#34; \&#34;量\&#34; \&#34; \&#34; \&#34;积蓄\&#34; \&#34; \&#34 ; \&#34; employment_duration \&#34; \&#34; \&#34; \&#34; installment_rate \&#34; \&#34; \&#34; \&#34; personal_status_sex \&#34; \&#34; \&#34; \&#34; other_debtors \&#34; \&#34; \&#34; \&#34; present_residence \&#34; \&#34; \&#34; \&#34;属性\&#34; \&#34; \&#34; \&#34;年龄\&#34; \&#34 ;, \&#34; \&#34; other_installment_plans \&#34; \&#34; \&#34; \&#34;壳体\&#34; \&#34; \&#34; \ &#34; number_credits \&#34; \&#34; \&#34; \&#34;工作\&#34; \&#34; \&#34; \&#34; people_liable \ &#34; \&#34; \&#34; \&#34;电话\&#34; \&#34; \&#34; \&#34; foreign_worker \&#34; \& #34; \&#34; \&#34; credit_risk \&#34; \&#34; \&#34;&#34 ;,   &#34; \&#34; \&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,6,\&#34; \&#34;关键帐户/其他学分   现有\&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,1169,\&#34; \&#34;未知/不储   帐号\&#34; \&#34;,\&#34; \&#34; ...&gt; = 7年\&#34; \&#34;,4,\&#34; \& #34;男:   单\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;实   房地产\&#34; \&#34; 67,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;自己的\&# 34; \&#34;,2,\&#34; \&#34;本领域技术   雇员/官方\&#34; \&#34;,1,\&#34; \&#34;是\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; 0&lt; = ...   &LT; 200 DM \&#34; \&#34;,48,\&#34; \&#34;现有积分正式支付到   现在\&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,5951,\&#34; \&#34; ...... &LT; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; =   ......&lt; 4年\&#34; \&#34;,2,\&#34; \&#34;女性:   离婚/分离/已婚\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,2,\&#34; \&#34;真实   房地产\&#34; \&#34; 22,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;本领域技术   雇员/官方\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,0 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;否   支票帐户\&#34; \&#34;,12,\&#34; \&#34;关键帐户/其他信用   现有\&#34; \&#34;,\&#34; \&#34;再培训\&#34; \&#34;,2096,\&#34; \&#34; ......&lt; ; 100 DM \&#34; \&#34;,\&#34; \&#34; 4&lt; = ...   &LT; 7年\&#34; \&#34;,2,\&#34; \&#34;男性:单身\&#34; \&#34;,\&#34; \&#34;无\&#34; \&#34;,3,\&#34; \&#34;实   房地产\&#34; \&#34; 49,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;不熟练 -   居民\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; ...&lt; 100   DM \&#34; \&#34;,42,\&#34; \&#34;现有的积分直到   现在\&#34; \&#34; \&#34; \&#34;无线电/电视\&#34; \&#34;,7882,\&#34; \&#34; .. 。&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 4&lt; =   ......&lt; 7年\&#34; \&#34;,2,\&#34; \&#34;男:   单\&#34; \&#34;,\&#34; \&#34;担保人\&#34; \&#34;,4,\&#34; \&#34;建立社会储蓄   协议/人寿保险\&#34; \&#34;,45,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;对于   免费\&#34; \&#34;,1,\&#34; \&#34;本领域技术   雇员/官方\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; ...&lt; 100   DM \&#34; \&#34;,24,\&#34; \&#34;在过去的延迟付款\&#34; \&#34;,\&#34; \&# 34;车   (新)\&#34; \&#34;,4870,\&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4   年\&#34; \&#34;,3,\&#34; \&#34;男:单\&#34; \&#34;,\&#34; \&#34;无\ &#34; \&#34;,4,\&#34; \&#34;未知/无   财产\&#34; \&#34;,53,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;免费\& #34; \&#34;,2,\&#34; \&#34;本领域技术   雇员/官方\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,0 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;否   支票帐户\&#34; \&#34;,36,\&#34; \&#34;现有的信用额已经到期支付到   现在\&#34; \&#34;,\&#34; \&#34;再培训\&#34; \&#34;,9055,\&#34; \&#34;未知/不节省   帐号\&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4年\&#34; \&#34;,2,\&#34; \&#34;男:   单\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;未知/无   财产\&#34; \&#34;,35,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;免费\& #34; \&#34;,1,\&#34; \&#34;不熟练 -   居民\&#34; \&#34;,2,\&#34; \&#34;是\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;没有检查   帐号\&#34; \&#34;,24,\&#34; \&#34;现有积分已经到期支付到   现在\&#34; \&#34;,\&#34; \&#34;广播/电视\&#34; \&#34;,2835,\&#34; \&#34; 500&lt ; = ...&lt; 1000   DM \&#34; \&#34;,\&#34; \&#34; ...&gt; = 7年\&#34; \&#34;,3,\&#34; \& #34;男:   单\&#34; \&#34;,\&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;建立社会储蓄协议/生活   保险\&#34; \&#34; 53,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;本领域技术   雇员/官方\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; 0&lt; = ...   &LT; 200 DM \&#34; \&#34;,36,\&#34; \&#34;现有的积分直到   现在\&#34; \&#34;,\&#34; \&#34; car(used)\&#34; \&#34;,6948,\&#34; \&#34;。 ..&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4   年\&#34; \&#34;,2,\&#34; \&#34;男性:单身&#34; \&#34;,\&#34; \&#34;无\ &#34; \&#34;,2,\&#34; \&#34;汽车或   其他\&#34; \&#34; 35,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;租\&# 34; \&#34;,1,\&#34; \&#34;管理/自营/高度   合格的员工/官员\&#34; \&#34;,1,\&#34; \&#34;是\&#34; \&#34;,\&#34; \&#34;是\&#34; \&#34;,1 \&#34;&#34 ;,   &#34; \&#34; \&#34; \&#34;没有支票帐户\&#34; \&#34;,12,\&#34; \&#34;现有信用卡正式还清   到现在为止&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,3059,\&#34; \&#34; .. 。&gt; = 1000   DM \&#34; \&#34;,\&#34; \&#34; 4&lt; = ...&lt; 7年\&#34; \&#34;,2,\&#34; \&#34;男:   离婚/分离\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;实   房地产\&#34; \&#34; 61,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;不熟练 -   居民\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34; )

1 个答案:

答案 0 :(得分:1)

您的档案中有两件奇怪的事情

  • 该文件使用双引号""
  • 您文件中的行也被引用

"""a"",1" """b"",2"

这可能是因为您的文件是错误读取的csv文件(例如,使用错误类型的分隔符,如';'),然后将其作为csv文件写出。

首先删除外引号,然后使用双引号作为引号(如@ytu所示)似乎有效:

lines <- readLines("<yourfile>") lines <- gsub('(^"|"$)', "", lines) read.csv(textConnection(lines), quote = '""')