我有一个包含多年销售额的数据集。在此处示例:
yr_2008 <- data.frame(agent = c("agent1", "agent4", "agent1", "agent1", "agent1", "agent4"), sales = c(100, 200, 300, 130, 200, 400), year = 2008)
yr_2009 <- data.frame(agent = c("agent1", "agent3", "agent4", "agent1", "agent3", "agent4", "agent1", "agent3", "agent4"), sales = c(200, 500, 200, 200, 100, 100, 200, 300, 200), year = 2009)
yr_2010 <- data.frame(agent = c("agent1", "agent4", "agent2", "agent2", "agent2", "agent4"), sales = c(130, 300, 100, 200, 100, 200), year = 2010)
sales <- rbind(yr_2008, yr_2009, yr_2010)
每年为每个人生成摘要的适当方法是什么?例如,我想每年查看一个人进行销售的次数以及销售量。如果一个人当年不在那里,那么就有NA。例如,2008年,我希望将其作为输出
sales_output <- data.frame(agent = c("agent1", "agent2", "agent3", "agent4"),
yr08_transaction = c(3, NA, NA, 2),
yr08_sales = c(730, NA, NA, 600))
我还希望将所有这些信息仅包含在一个表中,如下所示
扩展:
sales_output <- data.frame(agent = c("agent1", "agent2", "agent3", "agent4"),
yr08_transaction = c(3, NA, NA, 2),
yr08_sales = c(730, NA, NA, 600),
yr09_transaction = c(3, 0, 3, 3),
yr09_sales = c(600, 0, 900, 500),
yr10_transaction = c(1, 3, 0, 2),
yr10_sales = c(130, 400, 0, 500))
sales_output
agent yr08_transaction yr08_sales yr09_transaction yr09_sales yr10_transaction yr10_sales
1 agent1 3 730 3 600 1 130
2 agent2 NA NA 0 0 3 400
3 agent3 NA NA 3 900 0 0
4 agent4 2 600 3 500 2 500
谢谢!
答案 0 :(得分:1)
这是一个dplyr
工作流程。如果您获取此数据并按年份对其进行分组代理商,您可以计算每个代理商每年的销售总额和参赛人数。要将其设置为宽格式,请使用gather
首先使其更长,将销售和交易同时放入单个列,unite
包含度量的年份,因此您有“2009_sales”等条目,然后spread
让它回到广泛的位置。 spread
也会使用NA
填充缺失值。
library(tidyverse)
yr_2008 <- data.frame(agent = c("agent1", "agent4", "agent1", "agent1", "agent1", "agent4"), sales = c(100, 200, 300, 130, 200, 400), year = 2008)
yr_2009 <- data.frame(agent = c("agent1", "agent3", "agent4", "agent1", "agent3", "agent4", "agent1", "agent3", "agent4"), sales = c(200, 500, 200, 200, 100, 100, 200, 300, 200), year = 2009)
yr_2010 <- data.frame(agent = c("agent1", "agent4", "agent2", "agent2", "agent2", "agent4"), sales = c(130, 300, 100, 200, 100, 200), year = 2010)
sales <- rbind(yr_2008, yr_2009, yr_2010)
sales_summary <- sales %>%
group_by(year, agent) %>%
summarise(sales = sum(sales), transactions = n()) %>%
gather(key = type, value = value, sales, transactions) %>%
unite("yr", year, type) %>%
spread(key = yr, value = value, sep = "")
sales_summary
#> # A tibble: 4 x 7
#> agent yr2008_sales yr2008_transactions yr2009_sales yr2009_transactions
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 agent1 730 4 600 3
#> 2 agent4 600 2 500 3
#> 3 agent3 NA NA 900 3
#> 4 agent2 NA NA NA NA
#> # ... with 2 more variables: yr2010_sales <dbl>, yr2010_transactions <dbl>
由reprex package(v0.2.0)创建于2018-05-13。
答案 1 :(得分:1)
以下是data.table
的选项。总结以获得观察数量和sum
'销售'按“代理人”和“年份”以及dcast
分组为“广泛”格式
library(data.table)
dcast(setDT(sales)[, .(transaction = .N, Sumsales = sum(sales)), by = .(agent, year)],
agent ~ substr(year, 3, 4), value.var = c('transaction', 'Sumsales'))
答案 2 :(得分:0)
将dplyr
与right_join
sales$agent <- as.character(sales$agent)
sales %>% filter(year==2008) %>% group_by(agent) %>%
summarise(yr08_transaction=n(),yr08_sales=sum(sales)) %>%
right_join(sales[!duplicated(sales$agent),c('agent','year')],by="agent") %>%
arrange(agent) %>% select(-year)
# A tibble: 4 x 3
agent yr08_transaction yr08_sales
<chr> <int> <dbl>
1 agent1 4 730
2 agent2 NA NA
3 agent3 NA NA
4 agent4 2 600