Question

我有一个日志文件（txt），其中包含选项卡的分隔：

T<=>31158[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702   34682   77  2012-05-07_07-52-43
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&#25105;&#29233;&#20320;;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702   34682   77  2012-05-07_07-52-43
T<=>31212[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401   34682   77  2012-05-07_07-52-43

导入此文件时，它返回错误：

df <- read.table("2012-05-07.txt", sep="\t", quote="", stringsAsFactors= FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 2 did not have 4 elements

如果我添加了arg'fill = TRUE'，则不会弹出错误但数据集错误：

df <- read.table("2012-05-07.txt", sep="\t", fill= TRUE, quote="", stringsAsFactors=FALSE)
df[2, ]
V1
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&
  V2 V3 V4

第一个单元格应该是

T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&#25105;&#29233;&#20320;;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702

但它是

T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&

它在#处断开，似乎字符串"我爱你"是“我爱你”的中文单词。谁能告诉我如何在一个单元格中获取所有字符串？非常感谢！

Answer 1

将参数comment.char设置为空：

read.table("file.txt", header= TRUE, quote= "", sep="\t", comment.char= "", stringsAsFactors= FALSE)

R read.table长字符串

1 个答案: