Question

我正在研究一个正在寻找家庭收入与子女数量之间关系的项目。为简单起见，假设我有这样的数据：

df <- data.frame(children = sample(0:9, 100, replace=TRUE),
                 income = floor(rnorm(100, 30000, 10000)))

我通过1-st，中位数，3-thd分位数将收入分成四组：

income.br <- with(df, c(-Inf, stats(income)[5], stats(income)[6],
                  stats(income)[7], Inf))

并保存为表格：

x <- with(df, table(children, cut(income, breaks = income.br)))

现在我需要计算每个收入组中mean个孩子的数量。这是我做的：

apply(x * as.numeric(levels(factor(df$children))), 2, sum) / apply(x, 2, sum)

它看起来很笨拙所以我在想是否有更好的方法来做到这一点（比如单向anova？）。谢谢！

Answer 1

可能这就是你想要的：

> with(df, tapply(children, cut(income, c(-Inf, quantile(income)[2:4], Inf)), mean))
    (-Inf,2.35e+04] (2.35e+04,2.96e+04] (2.96e+04,3.82e+04]     (3.82e+04, Inf] 
               5.32                4.40                4.36                3.84

表组中的计算平均值（可能使用anova？）

1 个答案: