处理文本挖掘中的短语动词

时间:2018-04-23 02:29:23

标签: r text-mining tidytext

短语动词在日常英语使用中非常重要。 R中是否有任何库允许我们处理它? 我尝试了两种方法,但似乎无法处理它

例如

library(sentimentr)
library(tidytext)
library(tidyverse)

x <- 'i vomit when i see her'
y <- 'i throw up when i see her'

# sentimentR
sentiment(x) #give sentiment of -0.4
sentiment(y) #give sentiment of 0

# Similarly, using tidytext
y %>% as_tibble() %>% 
    unnest_tokens(word, value) %>% 
    left_join(get_sentiments('bing'))    # give all words the sentiments of 0

我提出了一个(笨拙的)策略来处理短语动词:

# create a dummy phrasal verb sentiment score
phrasel_verb <- data.frame(bigram = c("throw up"), 
                           bigram_score = -1)

# use tidy text to make bigram--> join
y %>% as_tibble() %>% 
    unnest_tokens(bigram, value, 'ngrams', n = 2) %>% 
    separate(bigram, c('word','word2'), remove = F) %>% 
    left_join(phrasel_verb) %>% 
    left_join(get_sentiments('bing')) %>% 
    mutate(sentiment_all = coalesce(bigram_score, as.numeric(sentiment)))  %>% 
    summarise(sentiment_sum = sum(na.fill(sentiment_all, 0)))

结果是-1表示负面情绪。 有什么改进的想法吗?是否有任何具有短语动词情感评分的数据?

0 个答案:

没有答案