我有一个名为“enrollments”的数据框:
enrolled_at,unenrolled_at和fully_participated_at是因素。我想在我的数据框中添加一个新列,指示两个非空属性之间的小时差异。此新列的类型并不重要,但必须以此格式显示时间(HH MM SS)。
我想做以下伪代码:
If (unenrolled_at == empty && fully_participated_at != empty)
newAttributeValue = fully_participated_at - enrolled_at
else if (unenrolled_at != empty && fully_participated_at == empty)
newAttributeValue = unenrolled_at - enrolled_at
else
do nothing
编辑:我尝试了网站中的所有方法来执行此操作,但它们不起作用。时间存储在我的数据帧中作为因子类,但站点中的解决方案是因子因子或(字符串)时间 - (字符串)时间。我也分别尝试了“as.character”和“as.Date”函数。所以我的问题不重复。 Rolando Tamayo提供了不同的方法来解决我的问题,但它给了我错误:“ymd_hms中的错误(注释$ unenrolled_at):找不到函数”ymd_hms“”(我安装了lubridate包)
答案 0 :(得分:1)
您可以使用包lubridate:
library(lubridate)
#Create a df with dates
df<-tibble::tibble(
enrolled_at=as.factor(c("2002-06-09 12:45:40 UTC","2003-01-29 09:30:40 UTC",
"2002-09-04 16:45:40 UTC")),
unenrolled_at=as.factor(c("2002-11-13 20:00:40 UTC",
"2002-07-07 17:30:40","2002-07-07 17:30:40 UTC")))
df
# A tibble: 3 x 2
enrolled_at unenrolled_at
<fctr> <fctr>
1 2002-06-09 12:45:40 UTC 2002-11-13 20:00:40 UTC
2 2003-01-29 09:30:40 UTC 2002-07-07 17:30:40
3 2002-09-04 16:45:40 UTC 2002-07-07 17:30:40 UTC
#Check Class
class(df$enrolled_at)
[1] "factor"
#Check class after function ymd_hms
class(ymd_hms(df$enrolled_at))
[1] "POSIXct" "POSIXt"
#Calculete de difference in days
dif<-ymd_hms(df$ unenrolled_at)-ymd_hms(df$enrolled_at)
#difference like a period
as.period(dif)
[1] "157d 7H 15M 0S" "-205d -16H 0M 0S" "-58d -23H -15M 0S"
#Add as a column in df
df$newAttributeValue<-as.period(ymd_hms(df$ unenrolled_at)-ymd_hms(df$enrolled_at))
df
# A tibble: 3 x 3
enrolled_at unenrolled_at newAttributeValue
<fctr> <fctr> <S4: Period>
1 2002-06-09 12:45:40 UTC 2002-11-13 20:00:40 UTC 157d 7H 15M 0S
2 2003-01-29 09:30:40 UTC 2002-07-07 17:30:40 -205d -16H 0M 0S
3 2002-09-04 16:45:40 UTC 2002-07-07 17:30:40 UTC -58d -23H -15M 0S