我有一个数据,其中在表sf
-> Customer
id和Buy_date
中有2个字段。 Buy_date
是唯一的,但对于每个客户而言,但每个客户可以有3个以上不同的Buy_dates
值。我想计算每个Buy_date
在连续Customer
中的差及其平均值。我该怎么办。
示例
Customer Buy_date
1 2018/03/01
1 2018/03/19
1 2018/04/3
1 2018/05/10
2 2018/01/02
2 2018/02/10
2 2018/04/13
我希望每个客户的结果格式
Customer mean
答案 0 :(得分:0)
这是一个dplyr
解决方案。
您的数据:
df <- data.frame(Customer = c(1,1,1,1,2,2,2), Buy_date = c("2018/03/01", "2018/03/19", "2018/04/3", "2018/05/10", "2018/01/02", "2018/02/10", "2018/04/13"))
分组,平均Buy_date
的计算和汇总:
library(dplyr)
df %>% group_by(Customer) %>% mutate(mean = mean(as.POSIXct(Buy_date))) %>% group_by(Customer, mean) %>% summarise()
输出:
# A tibble: 2 x 2
# Groups: Customer [?]
Customer mean
<dbl> <dttm>
1 1 2018-03-31 06:30:00
2 2 2018-02-17 15:40:00
或者正如@ r2evans在Buy_date
s 之间的连续天中的评论中指出的那样:
df %>% group_by(Customer) %>% mutate(mean = mean(diff(as.POSIXct(Buy_date)))) %>% group_by(Customer, mean) %>% summarise()
输出:
# A tibble: 2 x 2
# Groups: Customer [?]
Customer mean
<dbl> <time>
1 1 23.3194444444444
2 2 50.4791666666667
答案 1 :(得分:0)
我不确定所需的输出,但是我想这就是您想要的。
(.*) # One or more character (as capture group 1)
\n # a new-line
[^\n] # followed by one or more non new-lines
$ # at the end of the String
$1 # Replace it with the capture group 1 substring
# (so the last new-line, and everything after it are removed)
这将产生:
library(dplyr)
library(zoo)
dat <- read.table(text =
"Customer Buy_date
1 2018/03/01
1 2018/03/19
1 2018/04/3
1 2018/05/10
2 2018/01/02
2 2018/02/10
2 2018/04/13", header = T, stringsAsFactors = F)
dat$Buy_date <- as.Date(dat$Buy_date)
dat %>% group_by(Customer) %>% mutate(diff_between = as.vector(diff(zoo(Buy_date), na.pad=TRUE)),
mean_days = mean(diff_between, na.rm = TRUE))
根据用户评论进行编辑:
因为您说您有因素,但没有字符,只需执行以下操作即可将其转换:
Customer Buy_date diff_between mean_days
<int> <date> <dbl> <dbl>
1 1 2018-03-01 NA 23.3
2 1 2018-03-19 18 23.3
3 1 2018-04-03 15 23.3
4 1 2018-05-10 37 23.3
5 2 2018-01-02 NA 50.5
6 2 2018-02-10 39 50.5
7 2 2018-04-13 62 50.5