我有一个数据集,其中包含来自商店不同分支的商品价格,看起来像这样:
Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100
我想要的是:对于项和链的每个唯一组合,请计算三个分支的中间价格。
我尝试了类似的方法
aggregate(data[,3:5], list(data$Item, data$Chain), median)
但是没有用。关于如何解决此问题的任何想法?
答案 0 :(得分:1)
您可以使用group_by()
和summarise()
:
library(dplyr)
df <- data_frame(Item = c("Laptop","Laptop","Laptop","TV","TV","TV"),
Chain = c("Sears","JCP","Macys","Sears","JCP","Macys"),
Branch1 = c(1000,1300,1500,800,400,900),
Branch2 = c(1100,900,1800,600,600,1000),
Branch3 = c(900,1200,1700,700,700,1100))
df %>%
group_by(Item, Chain) %>%
summarise(median = median(c(Branch1, Branch2, Branch3)))
答案 1 :(得分:1)
问题是aggregate()
汇总了每个列。
出于完整性考虑,以下是一些替代方法:
apply()
dat$median <- apply(dat[, 3:5], 1L, median)
dat
Item Chain Branch1 Branch2 Branch3 median 1: Laptop Sears 1000 1100 900 1000 2: Laptop JCP 1300 900 1200 1200 3: Laptop Macys 1500 1800 1700 1700 4: TV Sears 800 600 700 700 5: TV JCP 400 600 700 600 6: TV Macys 900 1000 1100 1000
data.table
library(data.table)
setDT(dat)[, .(median = median(c(Branch1, Branch2, Branch3))), by = .(Item, Chain)]
Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000
data.table
重塑为长格式后在合并之前,请遵循neilfws' suggestion从宽格式改成长格式:
library(data.table)
melt(setDT(dat), c("Item", "Chain"))[, .(median = median(value)), by = .(Item, Chain)]
Item Chain median 1: Laptop Sears 1000 2: Laptop JCP 1200 3: Laptop Macys 1700 4: TV Sears 700 5: TV JCP 600 6: TV Macys 1000
由于data
和df
是R函数的名称,因此我将使用其他名称,以免难以调试名称冲突:
dat <- data.table::fread("
Item,Chain,Branch1,Branch2,Branch3
Laptop,Sears,1000,1100,900
Laptop,JCP,1300,900,1200
Laptop,Macys,1500,1800,1700
TV,Sears,800,600,700
TV,JCP,400,600,700
TV,Macys,900,1000,1100")