将一个段落分成R中的句子向量

时间:2016-11-08 05:18:42

标签: r text-mining

我有以下段落:

  嗯,嗯......这么个人话题。难怪我是第一个撰写评论的人。我只想说这些东西就是他们所宣称和品尝的东西。我有这个领域的主要问题,现在我不知道。 '努夫说。 : - )

为了应用calculate_total_presence_sentiment包中的RSentiment命令,我想将此段落分成如下句子:

[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said."                                                             
[6] ":-)"

非常感谢您对此的帮助。

3 个答案:

答案 0 :(得分:1)

qdap有一个方便的功能:

  

sent_detect_nlp - 在终点边界上检测和分割句子   使用 openNLP & NLP 实用程序匹配的onld版本    openNLP 包现已删除sentDetect个功能。

library(qdap)

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"

sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."                                       
#[2] "No wonder I am the first to write a review."                             
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."           
#[5] "'Nuff said."                                                             
#[6] ":-)"

答案 1 :(得分:0)

肮脏的解决方案

select u.UM_TOKEN_NO ,u.UM_FULLNAME,u.SECTOR, u.department_name,t.TS_PROJECT_CODE,sum(t.TS_TOTAL_HRS)
    from prm_user_master u , prm_time_sheet t 
    WHERE (u.UM_TOKEN_NO = t.ts_token_no
    AND t.ts_week_no BETWEEN 35 and 40) 
    GROUP BY t.TS_PROJECT_CODE;

使用https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

中的工具

答案 2 :(得分:-1)

您可以将文本保存为.txt文件。确保.txt文件中的每一行都包含一个希望作为向量读取的语句。 使用基函数readLines('filepath/filename.txt')。 生成的数据框将读取原始文本文件中的每一行作为向量。

> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said'."                                                            
[6] ":-)"

> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."