如何合并具有连续日期时间的行

时间:2016-06-27 12:53:17

标签: r

来自R新手用户的问题:我有一个连续enabled_datetimedisabled_datetime的数据,如下所示:

x<-as.data.frame(cbind(
      supplier_id=281743,
      enabled_datetime=c('2016-06-13 13:31:02','2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14'),
      disabled_datetime = c('2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14',  NA),
      discount=c(25,15,15,10,30))
)
x

supplier_id    enabled_datetime   disabled_datetime discount
      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
      281743 2016-06-14 07:39:19 2016-06-14 12:36:03       15
      281743 2016-06-14 12:36:03 2016-06-16 13:44:30       15
      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
      281743 2016-06-17 06:42:14                <NA>       30

我想要转变的是这样的:

supplier_id    enabled_datetime   disabled_datetime discount
      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
      281743 2016-06-14 07:39:19 2016-06-16 13:44:30       15
      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
      281743 2016-06-17 06:42:14                <NA>       30

即。合并具有相同supplier_iddiscount的行并且具有连续的enabled_datetimedisabled_datetime。我能想到的是使用for循环,任何人都知道如何以不同的方式做到这一点?提前致谢。

1 个答案:

答案 0 :(得分:2)

 df <- data.frame(supplier_id = c(281743,281743,281743,281743,281743),
                 enabled_datetime = c("2016-06-13 13:31:02","2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14"),
                 disabled_datetime = c("2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14",NA),
                 discount = c(25,15,15,10,30))

df <- df%>%
  mutate(enabled_datetime = as.POSIXct(strftime(enabled_datetime,format="%Y-%m-%d %H:%M:%S")),
         disabled_datetime = as.POSIXct(strftime(disabled_datetime,format="%Y-%m-%d %H:%M:%S")))

subdf1 <- df%>% 
  group_by(supplier_id,discount)  %>%
  mutate(enabled_datetime_lead = lead(enabled_datetime),disabled_datetime_lead = lead(disabled_datetime)) %>%
  filter(disabled_datetime==enabled_datetime_lead) %>% mutate(disabled_datetime = disabled_datetime_lead) %>% 
  select(-enabled_datetime_lead,-disabled_datetime_lead) %>% ungroup()

subdf2<- anti_join(df,resdf,by=c("supplier_id","discount"))

resdf <- full_join(subdf1,subdf2,,by=c("supplier_id","discount"))

结果是

supplier_id    enabled_datetime   disabled_datetime discount
        <dbl>              <time>              <time>    <dbl>
1      281743 2016-06-14 07:39:19 2016-06-16 13:44:30       15
2      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
3      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
4      281743 2016-06-17 06:42:14                <NA>       30

更改说明:将最终语句从full_join更改为union,因为最终结果有两个新列。行为与最初发现的行为不同。