我有一个日志文件(txt),其中包含选项卡的分隔:
T<=>31158[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702 34682 77 2012-05-07_07-52-43
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=我爱你;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702 34682 77 2012-05-07_07-52-43
T<=>31212[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>http://tieba.baidu.com/p/1576129401 34682 77 2012-05-07_07-52-43
导入此文件时,它返回错误:
df <- read.table("2012-05-07.txt", sep="\t", quote="", stringsAsFactors= FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 4 elements
如果我添加了arg'fill = TRUE',则不会弹出错误但数据集错误:
df <- read.table("2012-05-07.txt", sep="\t", fill= TRUE, quote="", stringsAsFactors=FALSE)
df[2, ]
V1
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&
V2 V3 V4
第一个单元格应该是
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=我爱你;for(var i=1;i<=999;i++){PostHandler.post(rich_postor._option.url,c,function(I){rich_postor.showAddResult(I)},function(I){});};void 0[=]A<=>3504a4[=]B<=>540532[=]V<=>8.00.6001.18702
但它是
T<=>31200[=]P<=>iexplore.exe[=]I<=>1096[=]U<=>javascript:var c=rich_postor._getData();c.content=&
它在#
处断开,似乎字符串"我爱你"
是“我爱你”的中文单词。谁能告诉我如何在一个单元格中获取所有字符串?非常感谢!
答案 0 :(得分:0)
将参数comment.char
设置为空:
read.table("file.txt", header= TRUE, quote= "", sep="\t", comment.char= "", stringsAsFactors= FALSE)