假设我有一个如下所示的有序数据框:
df <- data.frame(customer = c('cust1','cust1','cust2','cust3','cust3'),
start_month = as.Date(c('2016-03-01','2017-08-01','2016-03-01','2017-07-01','2017-10-01')),
price = c(29,29,59,99,59),
end_month = as.Date(c('2017-08-01',NA,'2017-09-01','2017-09-01',NA)));
如何使用以下业务规则在R中编写脚本:如果客户在同一个月结束并启动,且价格未发生变化,请删除最新的事务。否则,保持交易。结果数据框如下所示:
new_df <- data.frame( customer = c('cust1','cust2','cust3','cust3'),
start_date = as.Date(c('2016-03-01','2016-03-01','2017-07-01','2017-10-01')),
price = c(29,59,99,59),
end_date = as.Date(c(NA,'2017-09-01','2017-09-01',NA)));
在此示例中,cust1的2017-08-01被忽略并过滤掉,因为价格与之前的交易相同。但是,由于价格不同,保留了cust3的交易。
我怎样才能在R?
中这样做答案 0 :(得分:1)
library(dplyr)
df <- df %>% group_by(customer) %>% mutate(change = lag(price) - price)
> df
# A tibble: 5 x 5
# Groups: customer [3]
customer start_month price end_month change
<fctr> <date> <dbl> <date> <dbl>
1 cust1 2016-03-01 29 2017-08-01 NA
2 cust1 2017-08-01 29 NA 0
3 cust2 2016-03-01 59 2017-09-01 NA
4 cust3 2017-07-01 99 2017-09-01 NA
5 cust3 2017-10-01 59 NA 40
客户的第一个条目始终是NA,我们会保留这些条目。我们将删除价格没有变化的行:
df <- df %>% filter(is.na(change) | change != 0)
> df
# A tibble: 4 x 5
# Groups: customer [3]
customer start_month price end_month change
<fctr> <date> <dbl> <date> <dbl>
1 cust1 2016-03-01 29 2017-08-01 NA
2 cust2 2016-03-01 59 2017-09-01 NA
3 cust3 2017-07-01 99 2017-09-01 NA
4 cust3 2017-10-01 59 NA 40
一体化命令:
library(dplyr)
df <-
df %>%
group_by(customer) %>%
mutate(change = lag(price) - price) %>%
filter(is.na(change) | change != 0)
我忘了检查日期是否发生变化:
library(dplyr)
df <-
df %>%
group_by(customer) %>%
mutate(change = lag(price) - price) %>%
mutate(date_change = lag(end_month) - start_month) %>%
filter((is.na(change) | change != 0) | (is.na(date_change) | date_change != 0))
这将保留每个第一个条目,并删除开始日期与上一个结束日期相同且价格未发生变化的行。