Question

我正在尝试使用R获取数据帧中某个字符串的分组计数，但到目前为止还无法提出解决方案。以下是一些示例数据和我尝试使用的代码，以便您对我要完成的工作有个大致的了解，并在下面做进一步的解释：

<%= pack.status.status %>

因此，我首先按季节对数据进行分组，然后尝试计算给定季节中任何情节的标题中出现“荷马”一词的总次数。

任何关于我犯错地方的建议将不胜感激。

最好，柯蒂斯

Answer 1

要向每行添加一个新变量，您需要使用mutate函数。除非您要按组进行汇总，否则不需要group_by：

simpson %>%
    mutate(homer_count = str_count(episode_title, 'Homer'))

# A tibble: 100 x 5
   season episode_title                     imdb_votes us_viewers_in_millions homer_count
    <int> <chr>                                  <int>                  <dbl>       <int>
 1      1 Simpsons Roasting on an Open Fire       3734                   26.7           0
 2      1 Bart the Genius                         1973                   24.5           0
 3      1 Homer's Odyssey                         1709                   27.5           1
 4      1 There's No Disgrace Like Home           1701                   20.2           0
 5      1 Bart the General                        1732                   27.1           0
 6      1 Moaning Lisa                            1674                   27.4           0
 7      1 The Call of the Simpsons                1638                   27.6           0
 8      1 The Telltale Head                       1580                   28             0
 9      1 Life on the Fast Lane                   1578                   33.5           0
10      1 Homer's Night Out                       1511                   30.3           1
# ... with 90 more rows

如果您想统计每个季节使用Homer的次数，请group_by，然后使用summarize生成一个新变量，每组一行：

simpson %>%
    group_by(season) %>%
    summarize(homer_count = sum(str_count(episode_title, 'Homer')))

# A tibble: 5 x 2
  season homer_count
   <int>       <int>
1      1           2
2      2           2
3      3           4
4      4           2
5      5           7

Answer 2

library(dplyr)

simpson %>%
  mutate(counts = str_count(episode_title, "Homer")) %>%  # count matches for each row (vectorised function)
  group_by(season) %>%                                    # for each season
  summarise(sum_counts = sum(counts))                     # sum counts

# # A tibble: 5 x 2
#   season sum_counts
#    <int>      <int>
# 1      1          2
# 2      2          2
# 3      3          4
# 4      4          2
# 5      5          7

在R中按组计算字符串模式

2 个答案: