使用lubridate计算负年龄,从出生日期和当前日期开始计算年龄

时间:2019-01-09 01:55:39

标签: r lubridate

我有看起来像这样的数据。它是一个数据框,其中包含许多人的出生日期(以及其他信息)。

library(tidyr)
library(dplyr)
library(magrittr)
library(lubridate)

df <- data.frame(
DATE_OF_BIRTH = c("20/10/01" , "15/04/88", "16/12/58", "15/10/91", "09/02/66", "02/07/03", "20/08/96", "22/04/99", "17/04/87", "17/08/56",
                "28/05/40", "26/07/59", "02/04/65", "17/08/93", "01/08/86", "30/07/01", "03/09/75", "17/09/65", "16/02/95", "11/06/03",
                "26/10/64", "25/02/73", "07/02/90", "31/03/38", "05/03/83", "10/02/61", "01/07/40", "15/08/51", "19/12/75", "25/11/58",
                "05/11/81", "05/12/02", "06/05/40", "23/09/69", "17/04/48", "02/07/58", "04/03/98", "26/11/03", "08/01/91", "23/12/07",
                "05/05/01", "23/10/08", "01/01/09", "29/10/63", "26/03/09", "03/02/75", "03/09/04", "17/01/80", "19/03/11", "05/07/83")
)

我要做的是根据每个人的出生日期(截至2017年7月1日)计算年龄。

要计算年龄,我使用以下代码:

df <- df %>%
mutate(age = interval(start = dmy(df$DATE_OF_BIRTH), end = dmy('01/07/17')) / 
duration(num = 1, units = "years"))

此输出对某些人是正确的,但对于其他人,我得到的是负值。对于这些人,他们的实际年龄是年龄abs(age)加上17的绝对值。

有人可以告诉我如何仅获得年龄的正值吗?谢谢。

我已经看到以下问题:Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date,但这不包括以负年龄作为输出的问题。

2 个答案:

答案 0 :(得分:3)

如果您检查dmy函数的输出

head(df$DATE_OF_BIRTH)
#[1] "20/10/01" "15/04/88" "16/12/58" "15/10/91" "09/02/66" "02/07/03"

head(dmy(df$DATE_OF_BIRTH))
#[1] "2001-10-20" "1988-04-15" "2058-12-16" "1991-10-15" "2066-02-09" "2003-07-02"

R将00-68年解释为2000-2068,将69-99年解释为1969-1999。因此,58被认为是2058,66被认为是2066,而88是1988。

来自?strptime

  

%y   没有世纪的年份(00–99)。在输入时,值00到68分别以20和69到99分别以19为前缀-这是2004和2008 POSIX标准指定的行为,但是他们也说'预计在将来的版本中,默认世纪是根据两位数的年份将改变


对于负值,您可以添加100以获得等效的正值

library(dplyr)
library(lubridate)

df %>%
  mutate(age = interval(start = dmy(DATE_OF_BIRTH), end = dmy('01/07/17')) / 
          duration(num = 1, units = "years"), 
          age = if_else(age < 0, age + 100, age))


#   DATE_OF_BIRTH       age
#1       20/10/01 15.706849
#2       15/04/88 29.230137
#3       16/12/58 58.512329
#4       15/10/91 25.728767
#5       09/02/66 51.356164
#6       02/07/03 14.008219
#7       20/08/96 20.876712
#....

要获得年份之间的差异,您还可以像这样使用interval

df %>%
  mutate(age = interval(dmy(DATE_OF_BIRTH), dmy('01/07/17')) / years(1),
         age = if_else(age < 0, age + 100, age))

答案 1 :(得分:0)

您需要将数据清理为lubridate或as.Date()都会产生相似的结果。

对于大于今天的任何转换年份(不合逻辑的DoB),请在转换日期后减去100年以使其有意义。下面的代码包含上述清洁部分。数据分析祝您好运!

library(tidyr)
library(dplyr)
library(magrittr)
library(lubridate)


library(tidyr)
library(dplyr)
library(magrittr)
library(lubridate)

df <- data.frame(
  DATE_OF_BIRTH = c("20/10/01" , "15/04/88", "16/12/58", "15/10/91", "09/02/66", "02/07/03", "20/08/96", "22/04/99", "17/04/87", "17/08/56",
                    "28/05/40", "26/07/59", "02/04/65", "17/08/93", "01/08/86", "30/07/01", "03/09/75", "17/09/65", "16/02/95", "11/06/03",
                    "26/10/64", "25/02/73", "07/02/90", "31/03/38", "05/03/83", "10/02/61", "01/07/40", "15/08/51", "19/12/75", "25/11/58",
                    "05/11/81", "05/12/02", "06/05/40", "23/09/69", "17/04/48", "02/07/58", "04/03/98", "26/11/03", "08/01/91", "23/12/07",
                    "05/05/01", "23/10/08", "01/01/09", "29/10/63", "26/03/09", "03/02/75", "03/09/04", "17/01/80", "19/03/11", "05/07/83")

)


#set the date for comparison
comparisondate<-as.Date("2017-07-01")

#Retrieve the lubridate format and clean it for incorrect conversions
df$DOBnew<-dmy(df$DATE_OF_BIRTH)
#calculate the age
df$age<-round(as.numeric(difftime(comparisondate,df$DOBnew,units="weeks")/52.25),digits=1)
df[df$age<0,"DOBnew"]<-df[df$age<0,"DOBnew"] %m-% years(100)

#recalculate age
df$age<-round(as.numeric(difftime(comparisondate,df$DOBnew,units="weeks")/52.25),digits=1)
df$age



[1] 15.7 29.2 58.5 25.7 51.3 14.0 20.8 18.2 30.2 60.8 77.0 57.9 52.2 23.8 30.9 15.9 41.8 51.7 22.3 14.0
[21] 52.6 44.3 27.4 79.1 34.3 56.3 76.9 65.8 41.5 58.5 35.6 14.6 77.0 47.7 69.1 58.9 19.3 13.6 26.4  9.5
[41] 16.1  8.7  8.5 53.6  8.3 42.3 12.8 37.4  6.3 33.9

all(df$age>0)
[1] TRUE