我试图通过MonetDB.R接口将217,000条记录(Jeopardy Dataset)的数据集导入MonetDB。
该文件是一个CSV文件,前两行如下:
show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3
4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,,,
4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,,,
我遇到的问题是导入ques
列(数据介于"")。该列有多个逗号和标点符号,monet.read.csv无法导入该列。
我尝试导入一些没有ques
列的记录,但效果很好。
您能否建议如何在monetdb中导入带有自由流文本的列?导入后,我打算对列进行一些文本分析。
答案 0 :(得分:1)
使用monet.read.csv
我也更喜欢MonetDBLite
以便于设置,但monet.read.csv
仅适用于MonetDB.R
感谢
mylines <-
c("show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3",
"4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,\"In 1963, live on \"\"The Art Linkletter Show\"\", this company served its billionth burger\",McDonald's,,,",
"4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,\"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States\",John Adams,,,")
tf <- tempfile()
dbfolder <- tempdir()
writeLines( mylines , tf )
library(MonetDBLite)
library(MonetDB.R)
db <- dbConnect( MonetDBLite() , dbfolder )
monet.read.csv( db , tf , 'mytable' )
# looks ok to me
dbReadTable( db , 'mytable' )