以下是我的输入数据框:
ID RID PID check_in check_out 10000 1 1 18-07-2014 9.30 19-07-2014 9.30 10000 2 2 21-07-2014 9.30 22-07-2014 9.30 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 10000 4 1 25-09-2012 9.30 30-09-2012 9.30
我试图将输出数据帧设为:
ID RID PID check_in check_out MonthYear Days 10000 1 1 18-07-2014 9.30 19-07-2014 9.30 Jul-14 1 10000 2 2 21-07-2014 9.30 22-07-2014 9.30 Jul-14 1 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 Oct-12 8 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 Nov-12 30 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 Dec-12 31 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 Jan-13 31 10000 3 2 23-10-2012 9.30 07-02-2013 9.30 Feb-13 6 10000 4 1 25-09-2012 9.30 30-09-2012 9.30 Sep-12 5
请帮忙
答案 0 :(得分:2)
根据预期的输出,似乎我们希望根据'check_in和'check_out'之间的每个月扩展行,并计算这些时间间隔内的天数
library(tidyverse)
df %>%
mutate_at(vars(matches('check')), dmy_hm) %>%
group_by(ID, RID) %>%
mutate(newdate = list(floor_date(seq(as.Date(as.yearmon(check_in)),
as.Date(as.yearmon(check_out)), by = '1 mon')))) %>%
unnest %>%
mutate(Days = if(n() > 1) c(days_in_month(newdate)[-n()] -
c(first(day(check_in)), rep(0, n()-2)), last(day(check_out))-1)
else day(check_out)- day(check_in)) %>%
ungroup %>%
mutate(MonthYear = format(newdate, "%b-%y")) %>%
select(names(df), MonthYear, Days)
# A tibble: 8 x 7
# ID RID PID check_in check_out MonthYear Days
# <int> <int> <int> <dttm> <dttm> <chr> <dbl>
#1 10000 1 1 2014-07-18 09:30:00 2014-07-19 09:30:00 Jul-14 1.00
#2 10000 2 2 2014-07-21 09:30:00 2014-07-22 09:30:00 Jul-14 1.00
#3 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 Oct-12 8.00
#4 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 Nov-12 30.0
#5 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 Dec-12 31.0
#6 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 Jan-13 31.0
#7 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 Feb-13 6.00
#8 10000 4 1 2012-09-25 09:30:00 2012-09-30 09:30:00 Sep-12 5.00
答案 1 :(得分:0)
我不知道您要对MonthYear
列做什么。但似乎你正在尝试做类似以下的事情。
df <- read.table(text=
"ID, RID, PID, check_in, check_out
10000, 1, 1, 18-07-2014 9.30, 19-07-2014 9.30
10000, 2, 2, 21-07-2014 9.30, 22-07-2014 9.30
10000, 3, 2, 23-10-2012 9.30, 07-02-2013 9.30
10000, 4, 1, 25-09-2012 9.30, 30-09-2012 9.30", header=T, sep=",")
df %>%
mutate_at(vars(check_in, check_out), ~as.POSIXct(.,format="%d-%m-%Y %H.%M")) %>%
mutate(Days = check_out - check_in)
# ID RID PID check_in check_out Days
# 1 10000 1 1 2014-07-18 09:30:00 2014-07-19 09:30:00 1 days
# 2 10000 2 2 2014-07-21 09:30:00 2014-07-22 09:30:00 1 days
# 3 10000 3 2 2012-10-23 09:30:00 2013-02-07 09:30:00 107 days
# 4 10000 4 1 2012-09-25 09:30:00 2012-09-30 09:30:00 5 days
如果您遵循一些建议here on how to provide a reproducible example,将来会得到更好的回复。我今天感觉很慷慨,所以无论如何我都回答了。 :)