如何正确读取文本数据

时间:2017-06-05 14:13:41

标签: r text

我刚刚开始在r中进行文本分析。通过阅读一些示例文本数据,我得到以下结果。

sms_raw <- read.csv("sms_spam.csv", stringsAsFactors = FALSE)
> str(sms_raw)
'data.frame':   5559 obs. of  2 variables:
$ type         : chr  "ham" "ham" "ham" "spam,\"complimentary 4 STAR Ibiza 
Holiday or £10,000 cash needs your URGENT collection. 09066364349 NOW from 
Landline not to l"| __truncated__ ...
$ text.........: chr  "Hope you are having a good week. Just checking 
in;;;;;;;;;" "K..give back my thanks.;;;;;;;;;" "Am also doing in cbe only. 
But have to pay.;;;;;;;;;" "" ...

在我看来好像变量没有正确分离。使用head函数进一步分析数据我得到以下结果:

head(sms_raw)

type
1                                                                                                                                                                    
ham
2                                                                                                                                                                    
ham
3                                                                                                                                                                    
ham
4 spam,"complimentary 4 STAR Ibiza Holiday or £10,000 cash needs your 
URGENT collection. 09066364349 NOW from Landline not to lose out! 
Box434SK38WP150PPM18+";;;;;;;;;
5                                                                                                                                                                   
spam
6                                                                                                                                                                    
ham

text.........
1                                                                                                                 
Hope you are having a good week. Just checking in;;;;;;;;;
2                                                                                                                                           
K..give back my thanks.;;;;;;;;;
3                                                                                                                       
Am also doing in cbe only. But have to pay.;;;;;;;;;

有人有建议如何解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

尝试data.table::fread("sms_spam.csv", stringsAsFactors = FALSE,sep=";")

修改

你可以试试: input_file<-readLines("/path/of/sms_spam.csv")