R中有没有简单的方法来计算两列两位数年份之间的差异(仅几年,没有月份/天,因为这里没有必要),以便生成一列年龄?
我对此很新,并且一直在使用'if'语句和代数而没有成功。
数据看起来像这样,但更大:
dat <- data.frame(year1=c("98","99","00","01","02"),
year2=c("03","04","05","06","07"))
答案 0 :(得分:3)
您可以使用strptime()
格式%y
:
dat <- data.frame(year1=c("98","99","00","01","02"),
year2=c("03","04","05","06","07"),
stringsAsFactors = F) # You might want to use this as a default!
dat$year1 <- strptime(dat$year1, format = "%y")
dat$year2 <- strptime(dat$year2, format = "%y")
as.vector(difftime(dat$year2,
dat$year1,
units = "days"))/365.242
4.999311 5.002163 4.999425 4.999425 4.999425
答案 1 :(得分:2)
格式化为日期,格式化为数字,取之不尽:
do.call(`-`, lapply(dat[1:2], function(x)
as.numeric(format(as.Date(x, format="%y"), "%Y"))))
#[1] -5 -5 -5 -5 -5
如果您在1900年代早期拥有旧约会,这可能会遇到无效的情况。根据{{1}}:
?strptime
答案 2 :(得分:0)
df$age <- ifelse(df$year2 < df$year1, df$year2 - df$year1 + 100, df$year2 -df$year1)
应该假设year2
是某种当前年份而year1
是出生年份,并且没有人在1918年之前出生。
示例:
df <- data.frame(year1 = sample(18:99, 1000, replace = T),
year2 = sample(1:99, 1000, replace = T))
> head(df)
year1 year2
1 27 88
2 41 55
3 90 36
4 81 93
5 56 60
6 27 61
df$age <- ifelse(df$year2 < df$year1, df$year2 - df$year1 + 100, df$year2 -df$year1)
> head(df)
year1 year2 age
1 73 88 15
2 50 17 67
3 47 41 94
4 54 43 89
5 36 82 46
6 62 85 23
使用您的数据示例:
dat <- data.frame(year1=c("98","99","00","01","02"),
year2=c("03","04","05","06","07"))
dat$age <- ifelse(as.numeric(as.character(dat$year2)) < as.numeric(as.character(dat$year1)),
as.numeric(as.character(dat$year2)) - as.numeric(as.character(dat$year1)) + 100,
as.numeric(as.character(dat$year2)) - as.numeric(as.character(dat$year1)))
> dat
year1 year2 age
1 98 03 5
2 99 04 5
3 00 05 5
4 01 06 5
5 02 07 5
答案 3 :(得分:0)
一种方法是将as.Date
与dplyr
链一起使用:
dat %>%
mutate(year1 = as.Date(year1, format = "%y"),
year2 = as.Date(year2, format = "%y")) %>%
mutate(age = year2 - year1)
返回:
year1 year2 age
1 1998-10-26 2003-10-26 1826 days
2 1999-10-26 2004-10-26 1827 days
3 2000-10-26 2005-10-26 1826 days
4 2001-10-26 2006-10-26 1826 days
5 2002-10-26 2007-10-26 1826 days
P.S。它假定两列的默认日期和月份,但它假设两者都相同,因此不会影响差异计算。