在R中的用户定义函数中传递多个参数?

时间:2018-06-07 07:18:39

标签: r dplyr

我的目标是创建一个用户定义的函数,它将数据框,月份或年份作为x,将产品类别作为y,并返回一个数据框,该城市拥有前10名客户组。

我不想把城市作为一个论点。

 toptencust <- function(df,x,y){
  library(magrittr)
  library(dplyr)

  ifelse(is.character(x)
    , df %>% 
      select_(City,Amount,Customer,Product,Year,month) %>%
      group_by_(City,Customer) %>%
      filter_(month==x & Product==y) %>% 
      summarise_(Tot_repay=sum(Amount,na.rm=T)) %>% 
      top_n(n=10)
    , df %>% 
      select_(City,Amount,Customer,Product,Year,month) %>%
      group_by_(City,Customer) %>%filter_(Year==x& Product==y) %>%
      summarise_(Tot_repay=sum(Amount,na.rm=T)) %>% 
      top_n(n=10)
    )

}

我的数据集看起来像

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Customer    Date        Amount  month     City        Product  Year
A1          12/01/04    495415  January   BANGALORE   Gold     2004
A1          03/01/04    245899  January   BANGALORE   Gold     2004
A1          15/01/04    259490  January   BANGALORE   Gold     2004
A1          25/01/04    437555  January   BANGALORE   Gold     2004
A1          17/01/05    165973  January   BANGALORE   Gold     2005
A1          23/02/05    365367  February  BANGALORE   Gold     2005
A1          01/02/05    14473   February  BANGALORE   Gold     2005
A8          05/02/04    100002  February  PATNA       Silver   2004
A9          28/02/05    100003  February  CHENNAI     Silver   2005
A10         16/02/05    48759   February  CALCUTTA    Gold     2005
A11         23/02/05    208318  February  COCHIN      Gold     2005
A12         03/02/05    150281  February  BOMBAY      Gold     2005
A13         04/02/06    339078  February  BANGALORE   Gold     2006
A14         25/03/06    137835  March     BANGALORE   Gold     2006
A15         31/03/06    437120  March     CALCUTTA    Gold     2006
A16         23/03/06    103924  March     COCHIN      Gold     2006
A17         19/03/04    408467  March     BOMBAY      Gold     2004
A18         05/03/06    100000  March     BANGALORE   Silver   2006
A19         04/04/05    10000   April     BANGALORE   Platinum 2005
A20         30/04/06    10001   April     CALCUTTA    Platinum 2006
A21         25/04/04    10002   April     COCHIN      Platinum 2004
A22         19/04/06    100000  April     BOMBAY      Silver   2006
A23         06/04/04    80346   April     BANGALORE   Silver   2004
A24         27/04/05    100002  April     DELHI       Silver   2005
A25         05/05/04    100003  May       COCHIN      Silver   2004
A26         06/05/06    470982  May       PATNA       Gold     2006
A27         07/05/05    357376  May       CHENNAI     Gold     2005
A28         08/05/06    326050  May       TRIVANDRUM  Gold     2006
A29         09/05/05    215083  May       CALCUTTA    Gold     2005
A30         10/05/06    481343  May       BANGALORE   Gold     2006")

我的目标是获得如下输出

Output required

当我运行此函数时,我收到如下错误:

toptencust(df,'February',2014)

总和错误(金额,na.rm = T):无效&#39;输入&#39;论证的(符号)

我无法理解这个问题,请帮忙吗?

1 个答案:

答案 0 :(得分:0)

执行你的例子,我收到另一个错误:

  

compat_lazy_dots(.dots,caller_env(),...)中的错误:     对象“城市”未找到

这是因为您使用了“逃生舱口”功能Lists.partition(A, n).parallelStream().forEach({ //do stuff with temp }); select_等等。您可能这样做是因为您需要在filter_中使用变量x。但现在像filter_(month==x & Product==y)这样的其他名称,也就是数据框内的名称,也被视为变量!

这是tutorial on the old "escape hatch" functions

现在,使用Product运算符可以有所不同地解决这个问题。查看小插图Programming with dplyr

!!