我的目标是创建一个用户定义的函数,它将数据框,月份或年份作为x,将产品类别作为y,并返回一个数据框,该城市拥有前10名客户组。
我不想把城市作为一个论点。
toptencust <- function(df,x,y){
library(magrittr)
library(dplyr)
ifelse(is.character(x)
, df %>%
select_(City,Amount,Customer,Product,Year,month) %>%
group_by_(City,Customer) %>%
filter_(month==x & Product==y) %>%
summarise_(Tot_repay=sum(Amount,na.rm=T)) %>%
top_n(n=10)
, df %>%
select_(City,Amount,Customer,Product,Year,month) %>%
group_by_(City,Customer) %>%filter_(Year==x& Product==y) %>%
summarise_(Tot_repay=sum(Amount,na.rm=T)) %>%
top_n(n=10)
)
}
我的数据集看起来像
df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Customer Date Amount month City Product Year
A1 12/01/04 495415 January BANGALORE Gold 2004
A1 03/01/04 245899 January BANGALORE Gold 2004
A1 15/01/04 259490 January BANGALORE Gold 2004
A1 25/01/04 437555 January BANGALORE Gold 2004
A1 17/01/05 165973 January BANGALORE Gold 2005
A1 23/02/05 365367 February BANGALORE Gold 2005
A1 01/02/05 14473 February BANGALORE Gold 2005
A8 05/02/04 100002 February PATNA Silver 2004
A9 28/02/05 100003 February CHENNAI Silver 2005
A10 16/02/05 48759 February CALCUTTA Gold 2005
A11 23/02/05 208318 February COCHIN Gold 2005
A12 03/02/05 150281 February BOMBAY Gold 2005
A13 04/02/06 339078 February BANGALORE Gold 2006
A14 25/03/06 137835 March BANGALORE Gold 2006
A15 31/03/06 437120 March CALCUTTA Gold 2006
A16 23/03/06 103924 March COCHIN Gold 2006
A17 19/03/04 408467 March BOMBAY Gold 2004
A18 05/03/06 100000 March BANGALORE Silver 2006
A19 04/04/05 10000 April BANGALORE Platinum 2005
A20 30/04/06 10001 April CALCUTTA Platinum 2006
A21 25/04/04 10002 April COCHIN Platinum 2004
A22 19/04/06 100000 April BOMBAY Silver 2006
A23 06/04/04 80346 April BANGALORE Silver 2004
A24 27/04/05 100002 April DELHI Silver 2005
A25 05/05/04 100003 May COCHIN Silver 2004
A26 06/05/06 470982 May PATNA Gold 2006
A27 07/05/05 357376 May CHENNAI Gold 2005
A28 08/05/06 326050 May TRIVANDRUM Gold 2006
A29 09/05/05 215083 May CALCUTTA Gold 2005
A30 10/05/06 481343 May BANGALORE Gold 2006")
我的目标是获得如下输出
当我运行此函数时,我收到如下错误:
toptencust(df,'February',2014)
总和错误(金额,na.rm = T):无效&#39;输入&#39;论证的(符号)
我无法理解这个问题,请帮忙吗?
答案 0 :(得分:0)
执行你的例子,我收到另一个错误:
compat_lazy_dots(.dots,caller_env(),...)中的错误: 对象“城市”未找到
这是因为您使用了“逃生舱口”功能Lists.partition(A, n).parallelStream().forEach({
//do stuff with temp
});
,select_
等等。您可能这样做是因为您需要在filter_
中使用变量x
。但现在像filter_(month==x & Product==y)
这样的其他名称,也就是数据框内的名称,也被视为变量!
这是tutorial on the old "escape hatch" functions
现在,使用Product
运算符可以有所不同地解决这个问题。查看小插图Programming with dplyr。
!!