计算R中签入和结帐日期之间的每月天数

时间:2018-03-10 14:41:34

标签: r

以下是我的输入数据框:

   ID   RID PID   check_in        check_out
10000   1   1   18-07-2014 9.30 19-07-2014 9.30
10000   2   2   21-07-2014 9.30 22-07-2014 9.30
10000   3   2   23-10-2012 9.30 07-02-2013 9.30
10000   4   1   25-09-2012 9.30 30-09-2012 9.30

我试图将输出数据帧设为:

   ID   RID PID     check_in    check_out       MonthYear   Days
10000   1   1   18-07-2014 9.30 19-07-2014 9.30 Jul-14  1
10000   2   2   21-07-2014 9.30 22-07-2014 9.30 Jul-14  1
10000   3   2   23-10-2012 9.30 07-02-2013 9.30 Oct-12  8
10000   3   2   23-10-2012 9.30 07-02-2013 9.30 Nov-12  30
10000   3   2   23-10-2012 9.30 07-02-2013 9.30 Dec-12  31
10000   3   2   23-10-2012 9.30 07-02-2013 9.30 Jan-13  31
10000   3   2   23-10-2012 9.30 07-02-2013 9.30 Feb-13  6
10000   4   1   25-09-2012 9.30 30-09-2012 9.30 Sep-12  5

请帮忙

2 个答案:

答案 0 :(得分:2)

根据预期的输出,似乎我们希望根据'check_in和'check_out'之间的每个月扩展行,并计算这些时间间隔内的天数

library(tidyverse)
df %>%
    mutate_at(vars(matches('check')), dmy_hm) %>% 
    group_by(ID, RID) %>%
    mutate(newdate = list(floor_date(seq(as.Date(as.yearmon(check_in)),
                as.Date(as.yearmon(check_out)), by = '1 mon')))) %>%
    unnest %>%

    mutate(Days =  if(n() > 1) c(days_in_month(newdate)[-n()] - 
             c(first(day(check_in)), rep(0, n()-2)), last(day(check_out))-1) 
              else day(check_out)- day(check_in)) %>%
    ungroup %>%
    mutate(MonthYear = format(newdate, "%b-%y")) %>%
    select(names(df), MonthYear, Days)  
# A tibble: 8 x 7
#     ID   RID   PID check_in            check_out           MonthYear  Days
#  <int> <int> <int> <dttm>              <dttm>              <chr>     <dbl>
#1 10000     1     1 2014-07-18 09:30:00 2014-07-19 09:30:00 Jul-14     1.00
#2 10000     2     2 2014-07-21 09:30:00 2014-07-22 09:30:00 Jul-14     1.00
#3 10000     3     2 2012-10-23 09:30:00 2013-02-07 09:30:00 Oct-12     8.00
#4 10000     3     2 2012-10-23 09:30:00 2013-02-07 09:30:00 Nov-12    30.0 
#5 10000     3     2 2012-10-23 09:30:00 2013-02-07 09:30:00 Dec-12    31.0 
#6 10000     3     2 2012-10-23 09:30:00 2013-02-07 09:30:00 Jan-13    31.0 
#7 10000     3     2 2012-10-23 09:30:00 2013-02-07 09:30:00 Feb-13     6.00
#8 10000     4     1 2012-09-25 09:30:00 2012-09-30 09:30:00 Sep-12     5.00

答案 1 :(得分:0)

我不知道您要对MonthYear列做什么。但似乎你正在尝试做类似以下的事情。

df <- read.table(text=
"ID,   RID, PID,   check_in,        check_out
10000,   1,   1,   18-07-2014 9.30, 19-07-2014 9.30
10000,   2,   2,   21-07-2014 9.30, 22-07-2014 9.30
10000,   3,   2,   23-10-2012 9.30, 07-02-2013 9.30
10000,   4,   1,   25-09-2012 9.30, 30-09-2012 9.30", header=T, sep=",")

df %>%
  mutate_at(vars(check_in, check_out), ~as.POSIXct(.,format="%d-%m-%Y %H.%M")) %>%
  mutate(Days = check_out - check_in)

#      ID RID PID            check_in           check_out     Days
# 1 10000   1   1 2014-07-18 09:30:00 2014-07-19 09:30:00   1 days
# 2 10000   2   2 2014-07-21 09:30:00 2014-07-22 09:30:00   1 days
# 3 10000   3   2 2012-10-23 09:30:00 2013-02-07 09:30:00 107 days
# 4 10000   4   1 2012-09-25 09:30:00 2012-09-30 09:30:00   5 days

如果您遵循一些建议here on how to provide a reproducible example,将来会得到更好的回复。我今天感觉很慷慨,所以无论如何我都回答了​​。 :)