循环数据框并在R中添加具有特定逻辑

时间:2018-04-17 21:43:15

标签: r loops dataframe iteration

我有一个数据框,其中包含有关销售分支,客户和销售的信息。

branch <- c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","LA","LA","LA","LA","LA","LA","LA","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa")

customer <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)

sales <- c(33816,24534,47735,1467,39389,30659,21074,20195,45165,37606,38967,41681,47465,3061,23412,22993,34738,19408,11637,36234,23809)


data   <- data.frame(branch, customer, sales)

我需要完成的是迭代每个分支,将每个客户带入分支机构,并将该客户的销售额除以分支机构的总数。我需要这样做,以了解每个客户对相应分支的总销售额贡献了多少。例如。对于客户1,我想划分33816/177600并将此值存储在新列中。 (177600是芝加哥分公司的总数)

我曾尝试编写一个函数来遍历for循环中的每一行,但我不知道如何在分支级别执行此操作。任何指导表示赞赏。

2 个答案:

答案 0 :(得分:0)

我们可以使用Map<String, Collection<Map<String, String>>>{ A1: [ {Item Number: "1234",Tax Code: "1"}, {Item Number: "2345",Tax Code: "2"}, {Item Number: "1234",Tax Code: "1"} ], B2: [ {Store Number: "111",Status: "2"}, {Store Number: "222",Status: "3"} ] } 来计算dplyr::group_by的总销售额。

dplyr::mutate

答案 1 :(得分:0)

考虑基线R ave用于内联汇总的新列,该列也考虑同一客户在同一分支内有多个记录:

data$customer_contribution <- ave(data$sales, data$customer, FUN=sum) / 
                              ave(data$sales, data$branch, FUN=sum)

data
#     branch customer sales customer_contribution
# 1  Chicago        1 33816           0.190405405
# 2  Chicago        2 24534           0.138141892
# 3  Chicago        3 47735           0.268778153
# 4  Chicago        4  1467           0.008260135
# 5  Chicago        5 39389           0.221784910
# 6  Chicago        6 30659           0.172629505
# 7       LA        7 21074           0.083576241
# 8       LA        8 20195           0.080090263
# 9       LA        9 45165           0.179117441
# 10      LA       10 37606           0.149139610
# 11      LA       11 38967           0.154537126
# 12      LA       12 41681           0.165300433
# 13      LA       13 47465           0.188238887
# 14   Tampa       14  3061           0.017462291
# 15   Tampa       15 23412           0.133560003
# 16   Tampa       16 22993           0.131169705
# 17   Tampa       17 34738           0.198172193
# 18   Tampa       18 19408           0.110718116
# 19   Tampa       19 11637           0.066386372
# 20   Tampa       20 36234           0.206706524
# 21   Tampa       21 23809           0.135824795

或者不那么罗嗦:

data$customer_contribution <- with(data, ave(sales, customer, FUN=sum) / 
                                         ave(sales, branch, FUN=sum))