我打算在读取.csv文件包含数据集中的双引号(例如德国信用数据集)时遇到问题。我想知道是否有任何有效的方法来从read函数中删除指定参数中的双引号。我已经尝试了几种方式,并没有得出我想要的结论。请帮我解决这个问题。谢谢。
[原始德国信用.csv数据集给出] [1]
然后我已尝试以下代码,但结果如下
GermanCredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = FALSE, header = TRUE, sep = "," , quote = "")
结果如下:
[read.csv with quote argument] [1]
然后我避免指定引用参数
germancredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = FALSE, header = TRUE, sep = ",")
产生以下结果:
[read.csv without quote argument] [1]
第三次我尝试使用read.table函数,如下所示
German_Credit <- read.table("D:/R Statistics/GermanCredit/germancredit.csv", quote = NULL, header = TRUE, sep = ",")
与第一个没有区别。我也使用了readr包中的fread函数,结果没有什么不同。任何人都可以告诉我在阅读csv文件时有效删除引用的方法。非常感谢你。
dput(readLines("D:/R Statistics/GermanCredit/germancredit.csv", n = 10))
C(&#34; \&#34; \&#34; \&#34;状态\&#34; \&#34; \&#34; \&#34;持续时间\&# 34; \&#34; \&#34; \&#34; credit_history \&#34; \&#34; \&#34; \&#34;目的\&#34; \&#34 ; \&#34; \&#34;量\&#34; \&#34; \&#34; \&#34;积蓄\&#34; \&#34; \&#34 ; \&#34; employment_duration \&#34; \&#34; \&#34; \&#34; installment_rate \&#34; \&#34; \&#34; \&#34; personal_status_sex \&#34; \&#34; \&#34; \&#34; other_debtors \&#34; \&#34; \&#34; \&#34; present_residence \&#34; \&#34; \&#34; \&#34;属性\&#34; \&#34; \&#34; \&#34;年龄\&#34; \&#34 ;, \&#34; \&#34; other_installment_plans \&#34; \&#34; \&#34; \&#34;壳体\&#34; \&#34; \&#34; \ &#34; number_credits \&#34; \&#34; \&#34; \&#34;工作\&#34; \&#34; \&#34; \&#34; people_liable \ &#34; \&#34; \&#34; \&#34;电话\&#34; \&#34; \&#34; \&#34; foreign_worker \&#34; \& #34; \&#34; \&#34; credit_risk \&#34; \&#34; \&#34;&#34 ;, &#34; \&#34; \&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,6,\&#34; \&#34;关键帐户/其他学分 现有\&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,1169,\&#34; \&#34;未知/不储 帐号\&#34; \&#34;,\&#34; \&#34; ...&gt; = 7年\&#34; \&#34;,4,\&#34; \& #34;男: 单\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;实 房地产\&#34; \&#34; 67,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;自己的\&# 34; \&#34;,2,\&#34; \&#34;本领域技术 雇员/官方\&#34; \&#34;,1,\&#34; \&#34;是\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; 0&lt; = ... &LT; 200 DM \&#34; \&#34;,48,\&#34; \&#34;现有积分正式支付到 现在\&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,5951,\&#34; \&#34; ...... &LT; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; = ......&lt; 4年\&#34; \&#34;,2,\&#34; \&#34;女性: 离婚/分离/已婚\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,2,\&#34; \&#34;真实 房地产\&#34; \&#34; 22,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;本领域技术 雇员/官方\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,0 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;否 支票帐户\&#34; \&#34;,12,\&#34; \&#34;关键帐户/其他信用 现有\&#34; \&#34;,\&#34; \&#34;再培训\&#34; \&#34;,2096,\&#34; \&#34; ......&lt; ; 100 DM \&#34; \&#34;,\&#34; \&#34; 4&lt; = ... &LT; 7年\&#34; \&#34;,2,\&#34; \&#34;男性:单身\&#34; \&#34;,\&#34; \&#34;无\&#34; \&#34;,3,\&#34; \&#34;实 房地产\&#34; \&#34; 49,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;不熟练 - 居民\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,42,\&#34; \&#34;现有的积分直到 现在\&#34; \&#34; \&#34; \&#34;无线电/电视\&#34; \&#34;,7882,\&#34; \&#34; .. 。&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 4&lt; = ......&lt; 7年\&#34; \&#34;,2,\&#34; \&#34;男: 单\&#34; \&#34;,\&#34; \&#34;担保人\&#34; \&#34;,4,\&#34; \&#34;建立社会储蓄 协议/人寿保险\&#34; \&#34;,45,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;对于 免费\&#34; \&#34;,1,\&#34; \&#34;本领域技术 雇员/官方\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,24,\&#34; \&#34;在过去的延迟付款\&#34; \&#34;,\&#34; \&# 34;车 (新)\&#34; \&#34;,4870,\&#34; \&#34; ...&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4 年\&#34; \&#34;,3,\&#34; \&#34;男:单\&#34; \&#34;,\&#34; \&#34;无\ &#34; \&#34;,4,\&#34; \&#34;未知/无 财产\&#34; \&#34;,53,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;免费\& #34; \&#34;,2,\&#34; \&#34;本领域技术 雇员/官方\&#34; \&#34;,2,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,0 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;否 支票帐户\&#34; \&#34;,36,\&#34; \&#34;现有的信用额已经到期支付到 现在\&#34; \&#34;,\&#34; \&#34;再培训\&#34; \&#34;,9055,\&#34; \&#34;未知/不节省 帐号\&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4年\&#34; \&#34;,2,\&#34; \&#34;男: 单\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;未知/无 财产\&#34; \&#34;,35,\&#34; \&#34;无\&#34; \&#34;,\&#34; \&#34;免费\& #34; \&#34;,1,\&#34; \&#34;不熟练 - 居民\&#34; \&#34;,2,\&#34; \&#34;是\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34;没有检查 帐号\&#34; \&#34;,24,\&#34; \&#34;现有积分已经到期支付到 现在\&#34; \&#34;,\&#34; \&#34;广播/电视\&#34; \&#34;,2835,\&#34; \&#34; 500&lt ; = ...&lt; 1000 DM \&#34; \&#34;,\&#34; \&#34; ...&gt; = 7年\&#34; \&#34;,3,\&#34; \& #34;男: 单\&#34; \&#34;,\&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;建立社会储蓄协议/生活 保险\&#34; \&#34; 53,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;本领域技术 雇员/官方\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\ &#34; \&#34;,1 \&#34;&#34;,&#34; \&#34; \&#34; \&#34; 0&lt; = ... &LT; 200 DM \&#34; \&#34;,36,\&#34; \&#34;现有的积分直到 现在\&#34; \&#34;,\&#34; \&#34; car(used)\&#34; \&#34;,6948,\&#34; \&#34;。 ..&lt; 100 DM \&#34; \&#34;,\&#34; \&#34; 1&lt; = ...&lt; 4 年\&#34; \&#34;,2,\&#34; \&#34;男性:单身&#34; \&#34;,\&#34; \&#34;无\ &#34; \&#34;,2,\&#34; \&#34;汽车或 其他\&#34; \&#34; 35,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;租\&# 34; \&#34;,1,\&#34; \&#34;管理/自营/高度 合格的员工/官员\&#34; \&#34;,1,\&#34; \&#34;是\&#34; \&#34;,\&#34; \&#34;是\&#34; \&#34;,1 \&#34;&#34 ;, &#34; \&#34; \&#34; \&#34;没有支票帐户\&#34; \&#34;,12,\&#34; \&#34;现有信用卡正式还清 到现在为止&#34; \&#34;,\&#34; \&#34;家用电器\&#34; \&#34;,3059,\&#34; \&#34; .. 。&gt; = 1000 DM \&#34; \&#34;,\&#34; \&#34; 4&lt; = ...&lt; 7年\&#34; \&#34;,2,\&#34; \&#34;男: 离婚/分离\&#34; \&#34; \&#34; \&#34;无\&#34; \&#34;,4,\&#34; \&#34;实 房地产\&#34; \&#34; 61,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;自己\&# 34; \&#34;,1,\&#34; \&#34;不熟练 - 居民\&#34; \&#34;,1,\&#34; \&#34;无\&#34; \&#34; \&#34; \&#34;是\&# 34; \&#34;,1 \&#34;&#34; )
答案 0 :(得分:1)
您的档案中有两件奇怪的事情
""
"""a"",1"
"""b"",2"
这可能是因为您的文件是错误读取的csv文件(例如,使用错误类型的分隔符,如';'),然后将其作为csv文件写出。
首先删除外引号,然后使用双引号作为引号(如@ytu所示)似乎有效:
lines <- readLines("<yourfile>")
lines <- gsub('(^"|"$)', "", lines)
read.csv(textConnection(lines), quote = '""')