来自R新手用户的问题:我有一个连续enabled_datetime
和disabled_datetime
的数据,如下所示:
x<-as.data.frame(cbind(
supplier_id=281743,
enabled_datetime=c('2016-06-13 13:31:02','2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14'),
disabled_datetime = c('2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14', NA),
discount=c(25,15,15,10,30))
)
x
supplier_id enabled_datetime disabled_datetime discount
281743 2016-06-13 13:31:02 2016-06-14 07:39:19 25
281743 2016-06-14 07:39:19 2016-06-14 12:36:03 15
281743 2016-06-14 12:36:03 2016-06-16 13:44:30 15
281743 2016-06-16 13:44:30 2016-06-17 06:42:14 10
281743 2016-06-17 06:42:14 <NA> 30
我想要转变的是这样的:
supplier_id enabled_datetime disabled_datetime discount
281743 2016-06-13 13:31:02 2016-06-14 07:39:19 25
281743 2016-06-14 07:39:19 2016-06-16 13:44:30 15
281743 2016-06-16 13:44:30 2016-06-17 06:42:14 10
281743 2016-06-17 06:42:14 <NA> 30
即。合并具有相同supplier_id
,discount
的行并且具有连续的enabled_datetime
和disabled_datetime
。我能想到的是使用for
循环,任何人都知道如何以不同的方式做到这一点?提前致谢。
答案 0 :(得分:2)
df <- data.frame(supplier_id = c(281743,281743,281743,281743,281743),
enabled_datetime = c("2016-06-13 13:31:02","2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14"),
disabled_datetime = c("2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14",NA),
discount = c(25,15,15,10,30))
df <- df%>%
mutate(enabled_datetime = as.POSIXct(strftime(enabled_datetime,format="%Y-%m-%d %H:%M:%S")),
disabled_datetime = as.POSIXct(strftime(disabled_datetime,format="%Y-%m-%d %H:%M:%S")))
subdf1 <- df%>%
group_by(supplier_id,discount) %>%
mutate(enabled_datetime_lead = lead(enabled_datetime),disabled_datetime_lead = lead(disabled_datetime)) %>%
filter(disabled_datetime==enabled_datetime_lead) %>% mutate(disabled_datetime = disabled_datetime_lead) %>%
select(-enabled_datetime_lead,-disabled_datetime_lead) %>% ungroup()
subdf2<- anti_join(df,resdf,by=c("supplier_id","discount"))
resdf <- full_join(subdf1,subdf2,,by=c("supplier_id","discount"))
结果是
supplier_id enabled_datetime disabled_datetime discount
<dbl> <time> <time> <dbl>
1 281743 2016-06-14 07:39:19 2016-06-16 13:44:30 15
2 281743 2016-06-13 13:31:02 2016-06-14 07:39:19 25
3 281743 2016-06-16 13:44:30 2016-06-17 06:42:14 10
4 281743 2016-06-17 06:42:14 <NA> 30
更改说明:将最终语句从full_join
更改为union
,因为最终结果有两个新列。行为与最初发现的行为不同。