使用汇总函数计算数据帧中单词的频率

时间:2018-01-07 22:11:27

标签: r dplyr

我试图在SO上查看,但找不到任何简单的答案。我的问题非常简单。

查看mpg包中的ggplot2数据集。我想先按modelmanufacturer进行分组,然后计算“drv”中字符串的频率 专栏("f", "r", "4")

我试过了:

mpg %>% 
  group_by(model, manufacturer) %>% 
  summarise(sum_drv = sum(drv))

这不起作用,因为drv列包含字符值。

谢谢,

3 个答案:

答案 0 :(得分:2)

您没有提供理想的输出,因此您可以使用两种不同的格式。

列中的所有drv值:

library(tidyverse)

mpg %>% count(model, manufacturer, drv) 

# # A tibble: 38 x 4
#   model              manufacturer drv       n
#   <chr>              <chr>        <chr> <int>
# 1 4runner 4wd        toyota       4         6
# 2 a4                 audi         f         7
# 3 a4 quattro         audi         4         8
# 4 a6 quattro         audi         4         3
# 5 altima             nissan       f         6
# 6 c1500 suburban 2wd chevrolet    r         5
# 7 camry              toyota       f         7
# 8 camry solara       toyota       f         7
# 9 caravan 2wd        dodge        f        11
#10 civic              honda        f         9
# # ... with 28 more rows

每个drv值作为列:

mpg %>% count(model, manufacturer, drv) %>% spread(drv, n, fill=0)

# # A tibble: 38 x 5
#   model              manufacturer   `4`     f     r
# * <chr>              <chr>        <dbl> <dbl> <dbl>
# 1 4runner 4wd        toyota        6.00  0     0   
# 2 a4                 audi          0     7.00  0   
# 3 a4 quattro         audi          8.00  0     0   
# 4 a6 quattro         audi          3.00  0     0   
# 5 altima             nissan        0     6.00  0   
# 6 c1500 suburban 2wd chevrolet     0     0     5.00
# 7 camry              toyota        0     7.00  0   
# 8 camry solara       toyota        0     7.00  0   
# 9 caravan 2wd        dodge         0    11.0   0   
#10 civic              honda         0     9.00  0   
# # ... with 28 more rows

答案 1 :(得分:1)

我认为table正是您所寻找的:

mpg %>% 
 group_by(model, manufacturer, drv) %>% 
 summarise(sum_drv = as.numeric(table(drv)))
# # A tibble: 38 x 4
# # Groups:   model, manufacturer [?]
# model manufacturer   drv sum_drv
# <chr>        <chr> <chr>   <dbl>
#  1        4runner 4wd       toyota     4       6
# 2                 a4         audi     f       7
# 3         a4 quattro         audi     4       8
# 4         a6 quattro         audi     4       3
# 5             altima       nissan     f       6
# 6 c1500 suburban 2wd    chevrolet     r       5
# 7              camry       toyota     f       7
# 8       camry solara       toyota     f       7
# 9        caravan 2wd        dodge     f      11
# 10              civic        honda     f       9
# # ... with 28 more rows

答案 2 :(得分:1)

您可以进行简单的更改以获得如下的计数/频率。 <{1}}包括drv作为group_by的一部分,然后您必须按每组计算。

mpg %>% 
  group_by(model, manufacturer, drv) %>% 
  summarise(sum_drv = n())

Results:
# A tibble: 38 x 4
# Groups:   model, manufacturer [?]
                model manufacturer   drv sum_drv
                <chr>        <chr> <chr>   <int>
 1        4runner 4wd       toyota     4       6
 2                 a4         audi     f       7
 3         a4 quattro         audi     4       8
 4         a6 quattro         audi     4       3
 5             altima       nissan     f       6
 6 c1500 suburban 2wd    chevrolet     r       5
 7              camry       toyota     f       7
 8       camry solara       toyota     f       7
 9        caravan 2wd        dodge     f      11
10              civic        honda     f       9