我想用一种简单的方式对出生年份进行分类。我尝试了cut
,看起来已经不错了。但是,我还不能完美解决。
给出两个出生年份的样本
set.seed(42)
s.even <- sample(2000:2015, 100, replace=TRUE)
s.odd <- sample(1998:2017, 100, replace=TRUE)
使用“偶数”样本,输出就可以了:
df.even <- data.frame(birthyear=s.even,
category=cut(s.even, 3,
labels=c("youth", "young", "youngsters")))
> with(df.even, ftable(category, birthyear))
birthyear 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
category
youth 8 4 5 7 5 4 0 0 0 0 0 0 0 0 0 0
young 0 0 0 0 0 0 7 5 6 6 8 0 0 0 0 0
youngsters 0 0 0 0 0 0 0 0 0 0 0 9 4 5 9 8
但是对于“奇数”样本,中断没有放置在正确的位置,即我希望第一类包含1998:2005
,第二类包含2006:2010
:
df.odd <- data.frame(birthyear=s.odd.s,
category=cut(s.odd.s, 3,
labels=c("youth", "young", "youngsters")))
> with(df.odd, ftable(category, birthyear))
birthyear 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
category
youth 3 3 10 6 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0
young 0 0 0 0 0 0 0 5 4 4 5 5 7 0 0 0 0 0 0 0
youngsters 0 0 0 0 0 0 0 0 0 0 0 0 0 2 11 9 5 2 8 2
所以我尝试以这种方式手动设置断点:
> cut(s.odd.s, s.odd.s[c(1,
+ which(s.odd.s %% 5 == 0 & !duplicated(s.odd.s)),
+ length(s.odd.s))])
[1] <NA> <NA> <NA> (1998,2000] (1998,2000] (1998,2000] (1998,2000]
[8] (1998,2000] (1998,2000] (1998,2000] (1998,2000] (1998,2000] (1998,2000] (1998,2000]
[15] (1998,2000] (1998,2000] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005]
[22] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005]
[29] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005] (2000,2005]
[36] (2000,2005] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010]
[43] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010]
[50] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010]
[57] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2005,2010] (2010,2015] (2010,2015]
[64] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015]
[71] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015]
[78] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015]
[85] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2010,2015] (2015,2017]
[92] (2015,2017] (2015,2017] (2015,2017] (2015,2017] (2015,2017] (2015,2017] (2015,2017]
[99] (2015,2017] (2015,2017]
Levels: (1998,2000] (2000,2005] (2005,2010] (2010,2015] (2015,2017]
但是以某种方式排除了1998
:
> head(s.odd.s)
[1] 1998 1998 1998 1999 1999 1999
无论如何,也许我错过了在cut()
中进行设置的选项?我还想以“偶数”转折点随意开始这三个类别,即1998:2004
2005:2009
2010:2017
。