R read.table长字符串

时间:2014-05-28 08:28:51

标签: r read.table

我有一个日志文件(txt),其中包含选项卡的分隔:

T<=>31158[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702   34682   77  2012-05-07_07-52-43
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&#25105;&#29233;&#20320;;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702   34682   77  2012-05-07_07-52-43
T<=>31212[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401   34682   77  2012-05-07_07-52-43

导入此文件时,它返回错误:

df <- read.table("2012-05-07.txt", sep="\t", quote="", stringsAsFactors= FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 2 did not have 4 elements

如果我添加了arg'fill = TRUE',则不会弹出错误但数据集错误:

df <- read.table("2012-05-07.txt", sep="\t", fill= TRUE, quote="", stringsAsFactors=FALSE)
df[2, ]
V1
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&
  V2 V3 V4

第一个单元格应该是

T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&#25105;&#29233;&#20320;;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702

但它是

T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&

它在#处断开,似乎字符串"&#25105;&#29233;&#20320;"是“我爱你”的中文单词。谁能告诉我如何在一个单元格中获取所有字符串?非常感谢!

1 个答案:

答案 0 :(得分:0)

将参数comment.char设置为空:

read.table("file.txt", header= TRUE, quote= "", sep="\t", comment.char= "", stringsAsFactors= FALSE)