我有一个看起来像这样的数据框:
ID category Household Income Tercile
1 unmarried couple 100,000 Middle
2 married couple 150,000 Bottom
3 single Female head of Household 90,000 Top
4 single Male Head of Household 80,000 Bottom
我想创建一个汇总统计表,该表显示按每个类别和三分位数分组的每个观察值的家庭收入的sd,平均值,最小值,最大值,中位数。
我能够为其中一个类别生成类似的表格。这是未婚夫妇的代码:
首先,我从总体数据框中分离出类别,并删除了不需要的变量:
status_unmarried <- merged_data %>%
select(-(person_id:is_college_graduate)) %>%
select(-(is_urban:is_owner_of_home)) %>%
filter(category == 'unmarried couple') %>%
group_by(hh_income, tercile_of_census_tract_income) %>%
distinct(hh_id, .keep_all = TRUE)
然后生成了必要的摘要统计信息:
library(dplyr)
table_one <- tableby(tercile_of_census_tract_income ~ ., data =
status_unmarried)
summary(table_one, title = "Unmarried households")
对于其余三个类别,我可以重复此过程。但是,我希望通过将所有类别汇总到一个代码块中来生成此表。并且不必根据类别分别创建每个表。表格或数据框看起来像这样
Unmarried Couple Married Couple Single Female Head Single Male Head
Bottom
Mean
Median
Min
Max
SD
Sample Size
Middle
Mean
Median
Median
Min
Max
SD
Sample Size
Top
Mean
Median
Min
Max
SD
Sample Size
样本量代表每个类别中有多少家庭。因此,我希望将列作为类别,将每一行作为统计量,但按三分位数进行进一步划分。我想用这些结果创建一个数据框或汇总表。
提前谢谢!
答案 0 :(得分:0)
考虑嵌套的基础R的let alert = UIAlertController(title: "", message: "", preferredStyle: UIAlertControllerStyle.alert)
let button1 = UIAlertAction(title: "button1", style: UIAlertActionStyle.default, handler: { action in
print("Button1 pressed")
})
let button2 = UIAlertAction(title: "button2", style: UIAlertActionStyle.default, handler: { action in
print("Button2 pressed")
})
let cancelAction = UIAlertAction(title: "Cancel", style: UIAlertActionStyle.default, handler: nil)
button1.setValue(UIColor.orange, forKey: "titleTextColor")
button2.setValue(UIColor.green, forKey: "titleTextColor")
cancelAction.setValue(UIColor.red, forKey: "titleTextColor")
}
alert.addAction(button1)
alert.addAction(button2)
alert.addAction(cancelAction)
self.present(alert, animated: true, completion: nil)
,该控制台提供了带有分节符和标题的控制台报告:
by
数据
tercile_agg_df_list <- by(random_df, random_df$Tercile, function(sub_df) {
by_list <- by(sub_df, sub_df$category, function(core_df)
with(core_df,
c(mean = mean(Household_Income), median = median(Household_Income),
min = min(Household_Income), max = max(Household_Income),
sd = sd(Household_Income), sample_size = length(Household_Income))
)
)
t(do.call(rbind, by_list))
})
tercile_agg_df_list
# random_df$Tercile: Bottom
# Married Couple Single Female Head Single Male Head Unmarried Couple
# mean 44632.894 50204.52677 58095.923 52521.3178
# median 49678.238 50042.54136 62158.775 51933.3694
# min 1989.695 95.23595 6220.779 676.9893
# max 95896.827 98471.19979 98317.740 94795.6344
# sd 29246.103 31317.47006 25728.368 28013.6172
# sample_size 35.000 56.00000 44.000 39.0000
# ----------------------------------------------------------------------------------
# random_df$Tercile: Middle
# Married Couple Single Female Head Single Male Head Unmarried Couple
# mean 56302.818 54845.140 42645.032 48222.93
# median 63245.388 51364.262 39126.608 49713.41
# min 2690.053 5286.126 3687.153 3430.90
# max 99327.726 99216.564 98645.000 98400.38
# sd 28582.935 32262.149 29996.185 28485.63
# sample_size 42.000 44.000 38.000 44.00
# ----------------------------------------------------------------------------------
# random_df$Tercile: Top
# Married Couple Single Female Head Single Male Head Unmarried Couple
# mean 51437.876 45495.1326 55150.495621 44958.808
# median 54592.978 42051.5708 56452.659052 45982.775
# min 3917.729 376.2815 1.451327 1216.967
# max 99638.078 95885.3950 99429.982156 99412.446
# sd 27627.480 26643.9194 30690.131884 29713.131
# sample_size 46.000 39.0000 31.000000 42.000
答案 1 :(得分:0)
从 data.table 包尝试此代码。您可能必须使用 as.data.table 函数将数据帧转换为data.table。考虑到数据框名称为dt
dt[, .(Min=min(Income), First_quartile=quantile(Income, 0.1),
Median=quantile(Income, 0.5), Mean=mean(Income),
Third_Quartile=quantile(Income, 0.75),
Max=max(Income)) ,
by=.(Category, Tercile)]
这将以另一种格式生成表格,但我认为它更有条理。