Question

我有两个数据框，其中包含单词列和这些单词的相关分数。我希望通过这些框架运行评论，并根据单词是否出现在句子中来创建附加分数。

我想在很多很多评论中做到这一点，因此它需要具有计算效率。例如，句子＆＃34;嗨，他说。为什么没关系＆＃34;得分为.98 + .1 + .2，因为单词＆＃34; hi＆＃34;，＆＃34;为什么＆＃34;，＆＃34;好的＆＃34;在数据框架中。任何句子都可能包含来自多个数据框的单词。

任何人都可以帮我创建专栏＆＃34; add_score＆＃34;使用可以很好地扩展到大型数据帧的过程？谢谢

a <- data.frame(words = c("hi","no","okay","why"),score = c(.98,.5,.2,.1))
b <- data.frame(words = c("bye","yes","here",score = c(.5,.3,.2)))
comment_df = data.frame(id = c("1","2","3"),  comments = c("hi, he said. why 
is it okay","okay okay okay no","yes, here is it"))
comment_df$add_score = c(1.28,1.1,.5)

Answer 1

此解决方案使用tidyverse和stringr中的函数。

# Load packages
library(tidyverse)
library(stringr)

# Merge a and b to create score_df
score_df <- bind_rows(a, b)

# Create a function to calculate score for one string
string_cal <- function(string, score_df){

  temp <- score_df %>%
    # Count the number of words in one string
    mutate(Number = str_count(string, pattern = fixed(words))) %>%
    # Calcualte the score
    mutate(Total_Score = score * Number) 

  # Return the sum
  return(sum(temp$Total_Score))
}

# Use map_dbl to apply the string_cal function over comments
# The results are stored in the add_score column
comment_df <- comment_df %>%
  mutate(add_score = map_dbl(comments, string_cal, score_df = score_df))

数据准备

a <- data.frame(words = c("hi","no","okay","why"),
                score = c(.98,.5,.2,.1))
b <- data.frame(words = c("bye","yes","here"),
                score = c(.5,.3,.2))
comment_df <- data.frame(id = c("1","2","3"),
                         comments = c("hi, he said. why is it okay",
                                      "okay okay okay no",
                                      "yes, here is it"))

根据单词出现次数创建分数

1 个答案:

数据准备