在tm package / qdap output

时间:2017-11-07 09:36:23

标签: r special-characters tm qdap

我正在尝试使用以下数据集

在R中创建一个术语 - 文档矩阵
  EmailSubject
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Thank You Chennai
Limited Period offer
 Valentines day special
 Buy a phone at 10000 and get a new sim free
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.

提前体验手机。    谢谢你奈     限期优惠

我使用过qdap和freq_terms。以下是预期的输出

  freq_terms(DF)


     Expected Output    Frequency
      Buy               4
      Get               5
       a                7
      thank             12
     Stunning            6
         The             7
         New             10
       Valentines        4
        phone            7

以下特殊字符会不断出现并使数据不适合。

           valentinea€™s, a€™s instead of valentines, as. I have tried the same with tm package also. 

我使用gsub替换这些字符,但效果不是很好。有人可以建议吗?

0 个答案:

没有答案