根据变量中的多个类别创建摘要统计表

时间:2019-04-24 18:05:19

标签: r dataframe datatable dplyr summary

我有一个看起来像这样的数据框:

 ID    category                          Household Income     Tercile   
  1     unmarried couple                    100,000             Middle
  2     married couple                      150,000             Bottom
  3     single Female head of Household     90,000              Top
  4     single Male Head of Household       80,000              Bottom

我想创建一个汇总统计表,该表显示按每个类别和三分位数分组的每个观察值的家庭收入的sd,平均值,最小值,最大值,中位数。

我能够为其中一个类别生成类似的表格。这是未婚夫妇的代码:

首先,我从总体数据框中分离出类别,并删除了不需要的变量:

status_unmarried <- merged_data %>% 
select(-(person_id:is_college_graduate)) %>%
select(-(is_urban:is_owner_of_home)) %>%
filter(category == 'unmarried couple') %>%
group_by(hh_income, tercile_of_census_tract_income) %>% 
distinct(hh_id, .keep_all = TRUE)

然后生成了必要的摘要统计信息:

library(dplyr)
table_one <- tableby(tercile_of_census_tract_income ~ ., data = 
status_unmarried)
summary(table_one, title = "Unmarried households")

对于其余三个类别,我可以重复此过程。但是,我希望通过将所有类别汇总到一个代码块中来生成此表。并且不必根据类别分别创建每个表。表格或数据框看起来像这样

        Unmarried Couple   Married Couple  Single Female Head Single Male Head

Bottom
Mean
Median
Min
Max
SD
Sample Size

Middle
Mean
Median
Median
Min
Max
SD
Sample Size

Top
Mean
Median
Min
Max
SD
Sample Size

样本量代表每个类别中有多少家庭。因此,我希望将列作为类别,将每一行作为统计量,但按三分位数进行进一步划分。我想用这些结果创建一个数据框或汇总表。

提前谢谢!

2 个答案:

答案 0 :(得分:0)

考虑嵌套的基础R的let alert = UIAlertController(title: "", message: "", preferredStyle: UIAlertControllerStyle.alert) let button1 = UIAlertAction(title: "button1", style: UIAlertActionStyle.default, handler: { action in print("Button1 pressed") }) let button2 = UIAlertAction(title: "button2", style: UIAlertActionStyle.default, handler: { action in print("Button2 pressed") }) let cancelAction = UIAlertAction(title: "Cancel", style: UIAlertActionStyle.default, handler: nil) button1.setValue(UIColor.orange, forKey: "titleTextColor") button2.setValue(UIColor.green, forKey: "titleTextColor") cancelAction.setValue(UIColor.red, forKey: "titleTextColor") } alert.addAction(button1) alert.addAction(button2) alert.addAction(cancelAction) self.present(alert, animated: true, completion: nil) ,该控制台提供了带有分节符和标题的控制台报告:

by

数据

tercile_agg_df_list <- by(random_df, random_df$Tercile, function(sub_df) {
   by_list <- by(sub_df, sub_df$category, function(core_df)          
     with(core_df,
          c(mean = mean(Household_Income),  median = median(Household_Income), 
            min = min(Household_Income), max = max(Household_Income),
            sd = sd(Household_Income), sample_size = length(Household_Income))
         )
     )       
   t(do.call(rbind, by_list))
})

tercile_agg_df_list
# random_df$Tercile: Bottom
#             Married Couple Single Female Head Single Male Head Unmarried Couple
# mean             44632.894        50204.52677        58095.923       52521.3178
# median           49678.238        50042.54136        62158.775       51933.3694
# min               1989.695           95.23595         6220.779         676.9893
# max              95896.827        98471.19979        98317.740       94795.6344
# sd               29246.103        31317.47006        25728.368       28013.6172
# sample_size         35.000           56.00000           44.000          39.0000
# ---------------------------------------------------------------------------------- 
# random_df$Tercile: Middle
#             Married Couple Single Female Head Single Male Head Unmarried Couple
# mean             56302.818          54845.140        42645.032         48222.93
# median           63245.388          51364.262        39126.608         49713.41
# min               2690.053           5286.126         3687.153          3430.90
# max              99327.726          99216.564        98645.000         98400.38
# sd               28582.935          32262.149        29996.185         28485.63
# sample_size         42.000             44.000           38.000            44.00
# ---------------------------------------------------------------------------------- 
# random_df$Tercile: Top
#             Married Couple Single Female Head Single Male Head Unmarried Couple
# mean             51437.876         45495.1326     55150.495621        44958.808
# median           54592.978         42051.5708     56452.659052        45982.775
# min               3917.729           376.2815         1.451327         1216.967
# max              99638.078         95885.3950     99429.982156        99412.446
# sd               27627.480         26643.9194     30690.131884        29713.131
# sample_size         46.000            39.0000        31.000000           42.000

答案 1 :(得分:0)

data.table 包尝试此代码。您可能必须使用 as.data.table 函数将数据帧转换为data.table。考虑到数据框名称为dt

dt[, .(Min=min(Income), First_quartile=quantile(Income, 0.1),
   Median=quantile(Income, 0.5), Mean=mean(Income),
   Third_Quartile=quantile(Income, 0.75),
   Max=max(Income)) ,
by=.(Category, Tercile)]

这将以另一种格式生成表格,但我认为它更有条理。