无法批量导入自由流文本MonetDB.R

时间:2015-12-24 07:07:48

标签: r monetdb monetdblite

我试图通过MonetDB.R接口将217,000条记录(Jeopardy Dataset)的数据集导入MonetDB。

该文件是一个CSV文件,前两行如下:

show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3
4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,,,

4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,,,

我遇到的问题是导入ques列(数据介于"")。该列有多个逗号和标点符号,monet.read.csv无法导入该列。

我尝试导入一些没有ques列的记录,但效果很好。

您能否建议如何在monetdb中导入带有自由流文本的列?导入后,我打算对列进行一些文本分析。

1 个答案:

答案 0 :(得分:1)

使用monet.read.csv

我也更喜欢MonetDBLite以便于设置,但monet.read.csv仅适用于MonetDB.R感谢

mylines <-
    c("show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3", 
    "4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,\"In 1963, live on \"\"The Art Linkletter Show\"\", this company served its billionth burger\",McDonald's,,,", 
    "4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,\"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States\",John Adams,,,")

tf <- tempfile()
dbfolder <- tempdir()

writeLines( mylines , tf )

library(MonetDBLite)
library(MonetDB.R)

db <- dbConnect( MonetDBLite() , dbfolder )

monet.read.csv( db , tf , 'mytable' )

# looks ok to me
dbReadTable( db , 'mytable' )