Question

在399周内，我们有150个啤酒品牌的大数据集，在85个商店中出售。这些品牌仍分为子品牌（例如：品牌=百威，但仍然存在子品牌：百威淡啤酒/百威普通啤酒等）。我们想要创建一个函数，该函数创建一个新列，该列将为我们提供每个品牌的平均价格，如果： -品牌相同 -星期是一样的 -商店是一样的。

因此，我们的目标是获得一个列，该列显示每个商店每周每个品牌的平均价格为1（例如：第1周的商店1中的百威啤酒）。我们很难创建这个if语句/循环，因为对于R来说我们还很陌生。

到目前为止，我们已经尝试通过理解无循环的工作方式来解决这一步骤。因此，我们选择了特定的商店，品牌和星期，并创建了一个向量。这样，我们可以创建向量mean_price，将所有子品牌每个商店每周的所有价格相加，然后将其除以子品牌数（通过对一个子的向量求和得出）。

try1 <- subset(beer, select = c("brand","week","store","price_ounce","logprice_ounce", "sales_ounce","logsales_ounce"))

try1$vector <- c(1)

store5 <- subset(try1, store==5 & week==224 & brand=="ariel")
mean_price <- (sum(store5$logprice_ounce)/(sum(store5$vector)))
View(mean_price)
``

So far this leads to only one mean price, but we would like to have a column that displays 1 mean price per brand & store & week.
In the end, we need this to perform a regression to estimate price elasticities per store.

We are looking forward to any kind of help as we are completely lost.
Thank you in advance!

Answer 1

Dplyr库非常适合此类分析。您可以使用以下方法在dplyr中实现每家商店/品牌/啤酒的均值：

library(dplyr)

brand <- c("bud", "bud", "bud")
week <- c(1,1,1)
store <- c("A", "A", "A")
price_ounce <- c(2,3,2.2)

data <- data.frame(brand, week, store, price_ounce) %>%
  mutate(logprice_ounce = log(price_ounce))

answer <- data %>% 
  group_by(brand, week, store) %>%
  summarise(meanPrice = mean(price_ounce),
            geomMeanPrice = exp(mean(logprice_ounce)))

您可能会发现这本书很有用： R for Data Science

Answer 2

实际上，您不需要任何循环即可完成您想做的事情。例如，您可以使用库data.table

library(data.table)    
beer[, Mean:=mean(price_ounce), by=list(brand,week,store)]

您可以使用另一个名为 dplyr 的库来完成此操作，但我鼓励您看看 data.table ，它在处理大型数据集时速度更快

希望它能对您有所帮助。

我们如何构建同时满足多个条件的loop / if语句？

2 个答案: