I have a dataset包含每个销售助理的销售记录,按国家/地区,日期(年份和月份)和客户群划分。我无法找到一种方法将数据聚合成一种宽格式,其中每一行只有日期和销售人员,计算的总销售收入和每个客户群的百分比如下:
。
以下是数据集的CSV:
Year,Month,Country,Associate,Sales Revenue,Customer Segment
2015,1,USA,Bill,20,Enterprise
2015,1,USA,Bill,10,Enterprise
2015,1,Germany,Bill,5,Consumer
2015,1,USA,Bill,5,Enterprise
2015,1,Germany,Ted,5,Consumer
2015,1,USA,Bill,10,Consumer
2015,1,Germany,Bill,5,Consumer
2015,1,Germany,Ted,20,Enterprise
2015,1,Germany,Ted,20,Consumer
答案 0 :(得分:2)
有几种方法可以执行此操作,但基本上您需要group_by
其值不应更改的列,然后summarise
来创建新变量。
您可以输入所关注案例的所有内容和子集:
library(tidyverse)
df <- read_csv('Year,Month,Country,Associate,Sales Revenue,Customer Segment
2015,1,USA,Bill,20,Enterprise
2015,1,USA,Bill,10,Enterprise
2015,1,Germany,Bill,5,Consumer
2015,1,USA,Bill,5,Enterprise
2015,1,Germany,Ted,5,Consumer
2015,1,USA,Bill,10,Consumer
2015,1,Germany,Bill,5,Consumer
2015,1,Germany,Ted,20,Enterprise
2015,1,Germany,Ted,20,Consumer')
df %>%
group_by(Year, Month, Country, Associate) %>%
summarise(`Total Sales Revenue` = sum(`Sales Revenue`),
`Enterprise Sales %` = sum(`Sales Revenue`[`Customer Segment` == 'Enterprise']) /
`Total Sales Revenue`* 100,
`Consumer Sales %` = sum(`Sales Revenue`[`Customer Segment` == 'Consumer']) /
`Total Sales Revenue` * 100)
#> # A tibble: 3 x 7
#> # Groups: Year, Month, Country [?]
#> Year Month Country Associate `Total Sales Revenue` `Enterprise Sales %`
#> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2015 1 Germany Bill 10 0
#> 2 2015 1 Germany Ted 45 44.4
#> 3 2015 1 USA Bill 45 77.8
#> # ... with 1 more variable: `Consumer Sales %` <dbl>
...或者使用tidyr::spread
以编程方式执行此操作(可以更好地扩展到更多变量):
df %>%
janitor::clean_names() %>%
group_by(year, month, country, associate, customer_segment) %>%
summarise(revenue = sum(sales_revenue)) %>%
mutate(percent = revenue / sum(revenue) * 100) %>%
spread(customer_segment, percent) %>%
summarise_all(sum, na.rm = TRUE)
#> # A tibble: 3 x 7
#> # Groups: year, month, country [?]
#> year month country associate revenue Consumer Enterprise
#> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 2015 1 Germany Bill 10 100 0
#> 2 2015 1 Germany Ted 45 55.6 44.4
#> 3 2015 1 USA Bill 45 22.2 77.8