我有一个使用以下命令从mysql导出的数据
SELECT
id_code,info_text INTO OUTFILE '/tmp/company-desc.csv'
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM
dx_company WHERE LENGTH(id_code) = 8 AND
id_code REGEXP '^[0-9]+$';
但是当我尝试使用R中的以下命令加载csv时,
dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\n",header=FALSE)
或
dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\"",header=FALSE)
结果如下:
Id code description
2345 This is the description \n344555 \n737384 \n388383 \n000083
某些ID与描述混合在一起。 它基本上是引号和\ n在阅读时有问题。如果我试图给两个人打扰整个桌子。 我也尝试过gsub和readLines。 任何帮助。
快照:( CSV文件)
"1000004";"general"
"1000000";"licensed version, and products"
"1000007";""
"1000003";""
"1000002";""
"1000006";""
"1000002";"automobiles; well organised"
期望的输出:
Id_code Description
1000004 general
1000000 licensed version, and products
1000007 NA
1000003 NA
1000002 NA
1000006 NA
1000002 automobiles and industry; well organised
答案 0 :(得分:2)
这是使用data.table::fread
的一种方式,它也更快:
require(data.table) # v1.9.6+
fread(' "1000004";"general"
"1000000";"licensed version, and products"
"1000007";""
"1000003";""
"1000002";""
"1000006";""
"1000002";"automobiles; well organised"', na.strings="",
header=FALSE, col.names=c("Id_code", "Description"))
# Id_code Description
# 1: 1000004 general
# 2: 1000000 licensed version, and products
# 3: 1000007 NA
# 4: 1000003 NA
# 5: 1000002 NA
# 6: 1000006 NA
# 7: 1000002 automobiles; well organised