按组计算描述性统计

时间:2019-02-05 20:52:16

标签: r dplyr statistics grouping

我有一个从2所不同大学收集的数据集。每个人都包含学生的信息,例如国家/地区,年级,年龄等。我想在每所大学中提取每个国家/地区(按国家/地区分组)的最低,平均,最高,年级和年龄标准差并创建表格。

我正在使用的代码如下。我为每所大学重复代码的最小,最大和标准差。重复此过程是可以的,但是当我创建一个表时,我需要回到excel来合并从此代码中获得的统计信息。那么在R中有没有直接的方法可以做到这一点?

 stats_gr <- data %>%
 select(Country, Grades, Age) %>%
 group_by(country) %>%
 summarise(Grades = mean(Grades), Age=mean(Age))

I want a table like this

2 个答案:

答案 0 :(得分:1)

我使用knitr的kable()函数解决了这个问题。

我只是为了填充表格而生成的假数据集。这是一个单一的数据集,其中包含来自每个国家的两所大学的数据。


library(dplyr)
df <- tibble::tribble(
           ~University, ~Countries, ~Grades, ~Age,
        "University-1",      "USA",      46,    29,
        "University-1",       "UK",      84,    30,
        "University-1",   "Sweden",       5,    28,
        "University-1",    "Spain",      40,    26,
        "University-1", "Portugal",      49,    29,
        "University-1",    "Italy",      16,    24,
        "University-1",      "USA",      34,    19,
        "University-1",       "UK",      66,    28,
        "University-1",   "Sweden",       9,    25,
        "University-1",    "Spain",      80,    20,
        "University-1", "Portugal",      55,    20,
        "University-1",    "Italy",       4,    21,
        "University-1",      "USA",      93,    18,
        "University-1",       "UK",      62,    28,
        "University-1",   "Sweden",      80,    30,
        "University-2",    "Spain",       1,    22,
        "University-2", "Portugal",      56,    25,
        "University-2",    "Italy",       9,    29,
        "University-2",      "USA",      40,    21,
        "University-2",       "UK",      54,    20,
        "University-2",   "Sweden",      60,    24,
        "University-2",    "Spain",      77,    21,
        "University-2", "Portugal",      22,    18,
        "University-2",    "Italy",      53,    29,
        "University-2",      "USA",      11,    21,
        "University-2",       "UK",      65,    27,
        "University-2",   "Sweden",      24,    27,
        "University-2",    "Spain",      18,    23,
        "University-2", "Portugal",      73,    19,
        "University-2",    "Italy",      79,    22,
        "University-1",      "USA",       2,    26,
        "University-1",       "UK",      83,    23,
        "University-1",   "Sweden",       5,    19,
        "University-1",    "Spain",      75,    19,
        "University-1", "Portugal",      12,    21,
        "University-1",    "Italy",      68,    29,
        "University-1",      "USA",     100,    21,
        "University-1",       "UK",      49,    21,
        "University-1",   "Sweden",      81,    20,
        "University-1",    "Spain",      99,    23,
        "University-1", "Portugal",      82,    24,
        "University-1",    "Italy",      23,    26,
        "University-1",      "USA",      86,    30,
        "University-1",       "UK",      50,    20,
        "University-1",   "Sweden",       4,    19,
        "University-2",    "Spain",      12,    25,
        "University-2", "Portugal",      12,    21,
        "University-2",    "Italy",      45,    21,
        "University-2",      "USA",      16,    26,
        "University-2",       "UK",      56,    23,
        "University-2",   "Sweden",      63,    24,
        "University-2",    "Spain",      37,    28,
        "University-2", "Portugal",      86,    21,
        "University-2",    "Italy",      95,    18,
        "University-2",      "USA",      56,    20,
        "University-2",       "UK",      27,    20,
        "University-2",   "Sweden",       3,    27,
        "University-2",    "Spain",      18,    27,
        "University-2", "Portugal",      68,    27,
        "University-2",    "Italy",      48,    21
        )

使用dplyr和kable生成所需的表

  df %>% 
  group_by(University,Countries) %>%
  summarise(Grades_min = min(Grades), 
            Grades_mean = mean(Grades),
            Grades_max = max(Grades),
            Grades_sd = sd(Grades),
            Age_min = min(Age),
            Age_mean= mean(Age),
            Age_max = max(Age),
            Age_sd = sd(Age)) %>% 
  knitr::kable(col.names = c("University", 
                             "Country", 
                             "Min Grade", 
                             "Mean Grade", 
                             "Max Grade", 
                             "Grade SD", 
                             "Min Age", 
                             "Mean Age", 
                             "Max Age", 
                             "Age SD"))


|University   |Country  | Min Grade| Mean Grade| Max Grade| Grade SD| Min Age| Mean Age| Max Age|   Age SD|
|:------------|:--------|---------:|----------:|---------:|--------:|-------:|--------:|-------:|--------:|
|University-1 |Italy    |         4|   27.75000|        68| 27.95681|      21| 25.00000|      29| 3.366502|
|University-1 |Portugal |        12|   49.50000|        82| 28.82707|      20| 23.50000|      29| 4.041452|
|University-1 |Spain    |        40|   73.50000|        99| 24.61030|      19| 22.00000|      26| 3.162278|
|University-1 |Sweden   |         4|   30.66667|        81| 38.64022|      19| 23.50000|      30| 4.847680|
|University-1 |UK       |        49|   65.66667|        84| 15.31883|      20| 25.00000|      30| 4.195235|
|University-1 |USA      |         2|   60.16667|       100| 38.98931|      18| 23.83333|      30| 5.192944|
|University-2 |Italy    |         9|   54.83333|        95| 29.81554|      18| 23.33333|      29| 4.589844|
|University-2 |Portugal |        12|   52.83333|        86| 29.54601|      18| 21.83333|      27| 3.488075|
|University-2 |Spain    |         1|   27.16667|        77| 27.06597|      21| 24.33333|      28| 2.804758|
|University-2 |Sweden   |         3|   37.50000|        63| 29.03446|      24| 25.50000|      27| 1.732051|
|University-2 |UK       |        27|   50.50000|        65| 16.38088|      20| 22.50000|      27| 3.316625|
|University-2 |USA      |        11|   30.75000|        56| 21.06142|      20| 22.00000|      26| 2.708013|

此方法的好处是,如果您想使用rmarkdown编织成单词,它将很好地工作。如果这样做,该表格将如下所示screenshot of the generated table once knitted to word

您可以使用相关的kable参数来控制位数,表格标题或列对齐。

答案 1 :(得分:0)

也许stargazer适合您:

library(stargazer)
stats_gr <- data %>%
 select(Country, Grades, Age) %>%
 group_by(country) %>% stargazer(type="text")