Question

我可以围绕一个简单的问题解决一些问题。如何计算sentiment次中字符串值的数量（*）另一列times_used中的整数。可能是group_by()和summarise()？使用以下数据框：

        word times_used sentiment
       <chr>      <int>     <chr>
 1      fake         68  negative
 2       bad         36  negative
 3 president         35  positive
 4       tax         32  negative
 5   failing         21  negative
 6      vote         20  negative
 7      vote         20  positive
 8      deal         19  positive
 9       job         19  positive
10    united         19  positive
# ... with 475 more rows

最终寻找这样的东西：

times_used sentiment
     <int>     <chr>
      4090  negative
      3198  positive

Answer 1

如果我理解正确，你会想要这样的东西：

library(dplyr)
df %>%
 group_by(sentiment) %>%
 summarise(count = n(),
           words = sum(times_used)) %>%
 mutate(total = count * words)
#  A tibble: 2 x 4
#  sentiment count words total
#     <fctr> <int> <int> <int>
#1  negative     5   177   885
#2  positive     5   112   560

如果您只想要这两列，那么您可以链接select(sentiment, total)。

添加新列，使用dplyr

1 个答案: