使用R中的read.csv和read.table读取数据的问题

时间:2015-10-05 11:51:18

标签: mysql r csv

我有一个使用以下命令从mysql导出的数据

SELECT 
    id_code,info_text INTO OUTFILE '/tmp/company-desc.csv' 
    FIELDS TERMINATED BY ';' 
    OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\n'
FROM 
    dx_company WHERE LENGTH(id_code) = 8 AND 
    id_code REGEXP '^[0-9]+$';

但是当我尝试使用R中的以下命令加载csv时,

 dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\n",header=FALSE)

dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\"",header=FALSE)

结果如下:

Id code  description
2345     This is the description \n344555 \n737384 \n388383 \n000083

某些ID与描述混合在一起。 它基本上是引号和\ n在阅读时有问题。如果我试图给两个人打扰整个桌子。 我也尝试过gsub和readLines。 任何帮助。

快照:( CSV文件)

  "1000004";"general"
  "1000000";"licensed version, and products"
  "1000007";""
  "1000003";""
  "1000002";""
  "1000006";""
  "1000002";"automobiles; well organised"

期望的输出:

   Id_code  Description
  1000004  general
  1000000  licensed version, and products
  1000007  NA
  1000003  NA
  1000002  NA
  1000006  NA
  1000002  automobiles and industry; well organised

1 个答案:

答案 0 :(得分:2)

这是使用data.table::fread的一种方式,它也更快:

require(data.table) # v1.9.6+
fread('  "1000004";"general"
  "1000000";"licensed version, and products"
  "1000007";""
  "1000003";""
  "1000002";""
  "1000006";""
  "1000002";"automobiles; well organised"', na.strings="", 
header=FALSE, col.names=c("Id_code", "Description"))

#    Id_code                    Description
# 1: 1000004                        general
# 2: 1000000 licensed version, and products
# 3: 1000007                             NA
# 4: 1000003                             NA
# 5: 1000002                             NA
# 6: 1000006                             NA
# 7: 1000002    automobiles; well organised