Question

我有一个数据框，其中的列包含构成ngram的单词。我想总结一下每个ngram中的停用词的数量，并将这一列添加到数据框中，但是我不能想到用n的多个值（4克，5克等）来实现它的优雅方法。。）。

到目前为止，我一直在做以下：

mutate(Bigram_Counts_By_Company,
   stopword_count = (word1  %in% stop_words$word) %>% as.integer() +
                    (word2 %in% stop_words$word) %>% as.integer())

现在这样可行，但我更喜欢编写一个通用函数，对以＆＃34; name＆＃34;开头的所有列执行相同操作。

我想做什么：

mutate(Web_Bigram_Counts_By_Company,
   stopword_count = select(Web_Bigram_Counts_By_Company, starts_with("word")) %in% stop_words$word)

select(Web_Bigram_Counts_By_Company, starts_with("word"))非常适合选择名称以＆＃39; name＆＃39;开头的列，但是当我在mutate调用中使用它时，我收到此错误：Column 'stopword_count' must be length 360463 (the number of rows) or one, not 2

这只是一个简单的R基础错误还是我错了？

R：如何检查是否有多个列＆＃39;值列在

0 个答案: