我正在尝试根据每个人第一年的订单过滤订单表。
我的数据采用以下格式,其中每一行代表一个订单,但是我添加了客户级列来表示其首次下订单日期(Recruitment Date
)以及自自招募以来,每个客户的第一笔订单(1st Year Since Recruitment
)和第二年;最后一栏是当前订单的付款金额。
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1876202 obs. of 6 variables:
$ Brand_Acc : chr "B000000001" "B000000002" "B000000002" "B000000002" ...
$ salesdate : Date, format: "2008-03-10" "2008-02-19" "2008-07-14" "2010-08-25" ...
$ Recruitment Date : Date, format: "2008-03-10" "2008-02-19" NA NA ...
$ 1st Year Since Recruitment: Date, format: "2009-03-10" "2009-02-19" NA NA ...
$ 2nd Year Since Recruitment: Date, format: "2010-03-10" "2010-02-19" NA NA ...
$ TotalDiscount : num 97.9 349.9 184.9 284.9 348.9 ...
我想返回一个数据框,以捕获每个客户第一年的订单额。
我尝试了以下方法:
df %>%
group_by(Brand_Acc) %>%
filter(salesdate, between(`Recruitment Date`, `1st Year Since Recruitment`))
但是我得到这个错误:
Error in filter_impl(.data, quo) : Evaluation error: argument "right" is missing, with no default.
正确的方法是什么?
编辑显示前5行的内容:
dput(df)
structure(list(Brand_Acc = c("B000000001", "B000000002", "B000000002",
"B000000002", "B000000006"), salesdate = structure(c(13948, 13928,
14074, 14846, 13934), class = "Date"), ordertype = c("Recruitment",
"Recruitment", "Conversion", "Active Order", "Recruitment"),
actv_channel = c("MainMail", "MainMail", "Outbound-Other",
"MainMail", "MainMail"), TotalDiscount = c(97.87, 349.88,
184.94, 284.94, 348.9), campaignparentid = c("9017", "9017",
"9035", "9557", "9017"), BrandAccount_Brand = c("wp", "wp",
"wp", "wp", "wp"), recrtype = c("STNRD", "STNRD", "STNRD",
"STNRD", "STNRD"), POA_CODE = structure(c(1937L, 2302L, 2302L,
2302L, 466L), .Label = c("0", "200", "800", "801", "804"), class = "factor"),
`Recruitment Date` = structure(c(13948, 13928, NA, NA, 13934
), class = "Date"), `1st Year Since Recruitment` = structure(c(14313,
14294, NA, NA, 14300), class = "Date"), `2nd Year Since Recruitment` = structure(c(14678,
14659, NA, NA, 14665), class = "Date"), `3rd Year Since Recruitment` = structure(c(15043,
15024, NA, NA, 15030), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
> ```
答案 0 :(得分:2)
令人惊讶的是,between
的{{1}}和left
参数中没有向量化,这是您可能期望的,因为它将自身描述为right
和{ <=
。我们只需要采取很长的路要走:
>=
由reprex package(v0.2.1)于2019-02-13创建
library(tidyverse)
df <- structure(list(Brand_Acc = c("B000000001", "B000000002", "B000000002", "B000000002", "B000000006"), salesdate = structure(c(13948, 13928, 14074, 14846, 13934), class = "Date"), ordertype = c("Recruitment", "Recruitment", "Conversion", "Active Order", "Recruitment"), actv_channel = c("MainMail", "MainMail", "Outbound-Other", "MainMail", "MainMail"), TotalDiscount = c(97.87, 349.88, 184.94, 284.94, 348.9), campaignparentid = c("9017", "9017", "9035", "9557", "9017"), BrandAccount_Brand = c("wp", "wp", "wp", "wp", "wp"), recrtype = c("STNRD", "STNRD", "STNRD", "STNRD", "STNRD"), POA_CODE = structure(c(1937L, 2302L, 2302L, 2302L, 466L), .Label = c("0", "200", "800", "801", "804"), class = "factor"), `Recruitment Date` = structure(c(13948, 13928, NA, NA, 13934), class = "Date"), `1st Year Since Recruitment` = structure(c(14313, 14294, NA, NA, 14300), class = "Date"), `2nd Year Since Recruitment` = structure(c(14678, 14659, NA, NA, 14665), class = "Date"), `3rd Year Since Recruitment` = structure(c(15043, 15024, NA, NA, 15030), class = "Date")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))
df %>%
group_by(Brand_Acc) %>%
filter(salesdate >= `Recruitment Date` & salesdate <= `1st Year Since Recruitment`)
#> # A tibble: 3 x 13
#> # Groups: Brand_Acc [3]
#> Brand_Acc salesdate ordertype actv_channel TotalDiscount
#> <chr> <date> <chr> <chr> <dbl>
#> 1 B0000000… 2008-03-10 Recruitm… MainMail 97.9
#> 2 B0000000… 2008-02-19 Recruitm… MainMail 350.
#> 3 B0000000… 2008-02-25 Recruitm… MainMail 349.
#> # … with 8 more variables: campaignparentid <chr>,
#> # BrandAccount_Brand <chr>, recrtype <chr>, POA_CODE <fct>, `Recruitment
#> # Date` <date>, `1st Year Since Recruitment` <date>, `2nd Year Since
#> # Recruitment` <date>, `3rd Year Since Recruitment` <date>
中也存在语法错误,尽管现在不相关了:
between