Question

我在将cvs文件导入R（使用read.csv（））时遇到问题。

我想将一个csv数据文件导入到R中的数据框中，并将所有列设置为“Value”作为字符列。 “值”向量应为数字列。有人能帮帮我吗？

我已经和其他文件多次这样做但是由于某种原因这个文件不合作。问题可能是由于文件是欧式风格（十进制为“。”）。我不确定。

这是指向文件的链接：https://www.dropbox.com/s/9kqjiy5phj9qkg3/albania_%2B.csv?dl=0

Answer 1

使用<prop key="hibernate.ejb.event.pre-collection-update"> com.bla.bla.audit.listener.AuditEventListener </prop> <prop key="hibernate.ejb.event.pre-collection-remove"> com.bla.bla.audit.listener.AuditEventListener </prop> <prop key="hibernate.ejb.event.post-collection-recreate"> com.bla.bla.audit.listener.AuditEventListener </prop>阅读并删除第一个（readLines）和最后一个（^"）双引号以及任何双引号后跟另一个双引号（"$ ）创建"(?=")。然后使用L阅读read.table，指定L以获取as.is=TRUE和"character"列。

"numeric"

，并提供：

L <- gsub('^"|"$|"(?=")', '', readLines("albania_+.csv"), perl = TRUE)    
DF <- read.csv(text = L, as.is = TRUE)

以下是正则表达式的可视化：

> str(DF)
'data.frame':   544 obs. of  10 variables:
 $ Country.or.Area: chr  "Albania" "Albania" "Albania" "Albania" ...
 $ Year           : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ Area           : chr  "Urban" "Urban" "Urban" "Urban" ...
 $ Sex            : chr  "Female" "Female" "Female" "Female" ...
 $ Age            : chr  "Total" "0 - 4" "5 - 9" "10 - 14" ...
 $ Record.Type    : chr  "Estimate - de facto" "Estimate - de facto" "Estimate - de facto" "Estimate - de facto" ...
 $ Reliability    : chr  "Final figure, complete" "Final figure, complete" "Final figure, complete" "Final figure, complete" ...
 $ Source.Year    : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
 $ Value          : num  763925 39796 42761 55894 68627 ...
 $ Value.Footnotes: logi  NA NA NA NA NA NA ...

Regular expression visualization

Debuggex Demo

Answer 2

我看了一下你的文件，看起来格式很糟糕。有3个问题：

每行都以不必要的引号（"）开头。
每一行都以不必要的引语（"）结尾。
由于某种原因，行情加倍。而不是"fieldvalue"您的文件中有""fieldvalue""。

这只是一个解决此文件的解决方法（不用担心在第一行之后会收到警告）：

 textfile<-readLines("albania_+.csv")
 x<-gsub('"{2}','"',gsub('(^"|"$)',"",textfile))
 res<-read.csv(text=x,stringsAsFactors=FALSE)
 str(res)
 #'data.frame': 544 obs. of  10 variables:
 #$ Country.or.Area: chr  "Albania" "Albania" "Albania" "Albania" ...
 #$ Year           : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 #$ Area           : chr  "Urban" "Urban" "Urban" "Urban" ...
 #$ Sex            : chr  "Female" "Female" "Female" "Female" ...
 #$ Age            : chr  "Total" "0 - 4" "5 - 9" "10 - 14" ...
 #$ Record.Type    : chr  "Estimate - de facto" "Estimate - de facto"     "Estimate - de facto" "Estimate - de facto" ...
 #$ Reliability    : chr  "Final figure, complete" "Final figure, complete" "Final figure, complete" "Final figure, complete" ...
 #$ Source.Year    : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
 #$ Value          : num  763925 39796 42761 55894 68627 ...
 #$ Value.Footnotes: logi  NA NA NA NA NA NA ...

R中的read.csv（）包含所有字符列和一个数字列

2 个答案: