Question

我正在尝试使用udpipe的RAKE生成数据帧中每个文档25个RAKE令牌的列表，并将这些令牌（加上简单的str_count）写回到数据帧中。我构造了一个for循环来处理，但是我将相同的结果写入每一行，而不是将不同的结果写入每一行。

已安装和使用的软件包是udpipe，dplyr，stringi，stringr，data.table。

annotation$length <- nchar(annotation$token)

annotation <- annotation %>% filter(length >= 3 )

counter <- textdf$doc_id

for (i in counter) {
  subannotation <- annotation %>% filter(doc_id == i)
  stats <-
    keywords_rake(
      x = subannotation,
      term = "token", #token or lemma
      group = "doc_id",
      ngram_max = 3,
      n_min = 1,
      relevant = subannotation$upos %in% c("NOUN", "VERB", "ADV", "ADJ")
    )
  stats <- stats %>% top_n(25,rake)
  checktopics <- paste(stats$keyword, collapse =  " ")
  textdf$topics <- checktopics
  textdf$score <- str_count(checktopics,"cheese")

}

预期结果应类似于：

id score topics
1  12    chocolate chocoholics cheese
2  1     plastic waste cheese
3  3     neuroscientists data system

当前结果是：

id score topics
1  3     neuroscientists data system
2  3     neuroscientists data system
3  3     neuroscientists data system

我在做什么错了？

谢谢！

Answer 1

适当的解决方法是将指针添加到循环中的行。德尔普。

textdf$topics[i] <- checktopics
textdf$score[i] <- str_count(checktopics,"cheese")

R-将每篇文章的udpipe RAKE关键字解析回数据框

1 个答案: