Question

我想针对另一列对一列进行分类。问题是年龄组的汽车百分比是红色的低于35 36-48 49以上

数据

AGE RED_CAR freq
16  yes 2
17  yes 1
18  yes 2
19  yes 1
21  yes 1
22  yes 8
23  yes 4
24  yes 13
25  yes 5
26  yes 7
27  yes 4
28  yes 5
29  yes 5
30  yes 9

请帮助

Answer 1

有几种方法可以解决这个问题。我首选的方法是使用cut函数和tapply。

以下是一些示例数据：

AGE<-c(16:19, 21:30)
RED_CAR<-rep(c("yes", "no"), 7)
freq<-c(2, 1, 2, 1, 1, 8, 4, 13, 5, 7, 4, 5, 5, 9)
df<-data.frame(AGE, RED_CAR, freq)

以下是解决方案：

#define the age breaks, include a low and high limit to include all cases
# in this case breaks are at 17 and 48
agebreaks<-c(0, 17, 48, 99)

#tapply to sum the counts, includes the filtering for RED_CAR==yes
countbyage<-tapply(df$freq[df$RED_CAR=="yes"], cut(df$AGE[df$RED_CAR=="yes"], breaks= agebreaks), sum)

#divide bin counts by total count
percentage<-countbyage/sum(df$freq)

包dplyr有一些函数如group_by，tally和count也可以。基本分析基础R很好，但对于大型数据集，dplyr是更好的解决方案。

需要针对另一列对一列进行分类

1 个答案: