我试图编写一个能够分类到不同群体的功能。
假设我的数据如下:
birthyear
1987 1995 1994 1981 1994 1989 1985 1987 1996 1981 1980 1994 1996 1983 1949 1988
1998 1977 1967 1968
我的功能是将出生年份转换为年龄,然后根据名为agebreaks的数据框将它们分成10个不同类别中的1个:
>agebreaks
Category Birth.min Birth.max
1 14 to 19 years 2000 1995
2 20 to 24 years 1994 1990
3 25 to 34 years 1989 1980
4 35 to 44 years 1979 1970
5 45 to 54 years 1969 1960
6 55 to 59 years 1959 1955
7 60 to 64 years 1954 1950
8 65 to 74 years 1949 1940
9 75 to 84 years 1939 1930
10 85 years and over 1959 1864
功能:
bin.age <- function(burthyear,agebreak,2014){
p.ages <- yyyy-df$Age
ab <- as.data.frame(agebreak)
min.ab <- yyyy-ab$Birth.min
max.ab <- yyyy-ab$Birth.max
avec <- sort(c(min.ab[1],max.ab[1],min.ab[2],max.ab[2],min.ab[3],max.ab[3],min.ab[4],max.ab[4],min.ab[5],max.ab[5],min.ab[6],max.ab[6],min.ab[7],max.ab[7],min.ab[8],max.ab[8],min.ab[9],max.ab[9],min.ab[10],max.ab[10]))
tmp <- findInterval(p.ages, avec)
tt <- table(tmp)
names(tt)<-c("14 to 19 years","20 to 24 years","25 to 34 years","35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years","65 to 74 years","75 to 84 years","85 years and over")
return(tt)
}
我想要的是所有14到19岁的孩子,20到24岁的孩子分组,等等。我获得的不是所需的10组,是20个18组。我尝试过使用cut()也无济于事。有什么建议吗?
答案 0 :(得分:1)
cut()
可能是正确的功能。问题是你只需要指定范围的断点,而不是开始和结束间隔。该措施被认为是连续的。
#input data
birthyear <- c(1987, 1995, 1994, 1981, 1994, 1989, 1985, 1987, 1996, 1981,
1980, 1994, 1996, 1983, 1949, 1988, 1998, 1977, 1967, 1968)
agebreaks <- c(1864, 1929, 1939,1949,1954,1959,1969,1979,1989,1994,2000)
#cut
a < -cut(birthyear, agebreaks, include.lowest=T)
#rename
levels(a) <- rev(c("14 to 19 years","20 to 24 years","25 to 34 years",
"35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years",
"65 to 74 years","75 to 84 years","85 years and over"))
#table
as.data.frame(table(a))
#result
a Freq
1 85 years and over 0
2 75 to 84 years 0
3 65 to 74 years 1
4 60 to 64 years 0
5 55 to 59 years 0
6 45 to 54 years 2
7 35 to 44 years 1
8 25 to 34 years 9
9 20 to 24 years 3
10 14 to 19 years 4