Question

我有一个数据，其中在表sf-> Customer id和Buy_date中有2个字段。 Buy_date是唯一的，但对于每个客户而言，但每个客户可以有3个以上不同的Buy_dates值。我想计算每个Buy_date在连续Customer中的差及其平均值。我该怎么办。

示例

Customer   Buy_date
1          2018/03/01
1          2018/03/19
1          2018/04/3
1          2018/05/10
2          2018/01/02
2          2018/02/10
2          2018/04/13

我希望每个客户的结果格式

Customer  mean

Answer 1

这是一个dplyr解决方案。

您的数据：

df <- data.frame(Customer = c(1,1,1,1,2,2,2), Buy_date = c("2018/03/01", "2018/03/19", "2018/04/3", "2018/05/10", "2018/01/02", "2018/02/10", "2018/04/13"))

分组，平均Buy_date 的计算和汇总：

library(dplyr)
df %>% group_by(Customer) %>% mutate(mean = mean(as.POSIXct(Buy_date))) %>% group_by(Customer, mean) %>% summarise()

输出：

# A tibble: 2 x 2
# Groups:   Customer [?]
  Customer mean               
     <dbl> <dttm>             
1        1 2018-03-31 06:30:00
2        2 2018-02-17 15:40:00

或者正如@ r2evans在Buy_date s 之间的连续天中的评论中指出的那样：

df %>% group_by(Customer) %>% mutate(mean = mean(diff(as.POSIXct(Buy_date)))) %>% group_by(Customer, mean) %>% summarise()

输出：

# A tibble: 2 x 2 # Groups: Customer [?] Customer mean <dbl> <time> 1 1 23.3194444444444 2 2 50.4791666666667

Answer 2

我不确定所需的输出，但是我想这就是您想要的。

(.*)            # One or more character (as capture group 1)
    \n          # a new-line
      [^\n]     # followed by one or more non new-lines
           $    # at the end of the String

$1              # Replace it with the capture group 1 substring
                # (so the last new-line, and everything after it are removed)

这将产生：

library(dplyr)
library(zoo)
dat <- read.table(text = 
"Customer   Buy_date
1          2018/03/01
1          2018/03/19
1          2018/04/3
1          2018/05/10
2          2018/01/02
2          2018/02/10
2          2018/04/13", header = T, stringsAsFactors = F)


dat$Buy_date <- as.Date(dat$Buy_date)

dat %>% group_by(Customer) %>% mutate(diff_between = as.vector(diff(zoo(Buy_date), na.pad=TRUE)), 
                                      mean_days = mean(diff_between, na.rm = TRUE))

根据用户评论进行编辑：

因为您说您有因素，但没有字符，只需执行以下操作即可将其转换：

    Customer Buy_date   diff_between mean_days
     <int> <date>            <dbl>     <dbl>
1        1 2018-03-01           NA      23.3
2        1 2018-03-19           18      23.3
3        1 2018-04-03           15      23.3
4        1 2018-05-10           37      23.3
5        2 2018-01-02           NA      50.5
6        2 2018-02-10           39      50.5
7        2 2018-04-13           62      50.5

在R中进行分组操作

2 个答案: