我有一个由x个区域和通道唯一组合组成的数据框。我需要使用某种循环为每个x组合创建一个独特的回归模型。
region channel date trials spend
EMEA display 2015-01-01 62 17875.27
APAC banner 2015-01-01 65 18140.93
的影响
i=1
j=1
for r in region{
for ch in channel{
df1 = df[df$region == r & df$channel == ch, ]
model[[i,j]] = lm(trials ~ spend, data = df1)
j = j+1}
i = i+1 }
如果有人也知道存储区域+频道等唯一标识符的方法,以帮助识别非常有用的回归模型。
答案 0 :(得分:3)
plyr
解决方案:
set.seed(1)
d <- data.frame(region = letters[1:2],
channel = LETTERS[3:6],
trials = runif(20),
spend = runif(20))
列出结果列表(即按区域和渠道拆分d
,在每个具有指定公式的块上运行lm
,将结果作为列表返回)
library(plyr)
res <- dlply(d,c("region","channel"), lm,
formula=trials~spend)
将系数提取为数据框:
ldply(res,coef)
## region channel (Intercept) spend
## 1 a C 0.3359747 0.2444105
## 2 a E 0.7767959 -0.3745419
## 3 b D 0.7409942 -0.8084751
## 4 b F 1.0797439 -1.0872158
请注意,结果中包含您想要的区域/频道标识符......
答案 1 :(得分:2)
将split
数据用于2列组合作为列表,然后在循环中运行lm
- lapply
为每个数据子集运行,请参阅此示例:
# dummy data
set.seed(1)
d <- data.frame(region = letters[1:2],
channel = LETTERS[3:6],
trials = runif(20),
spend = runif(20))
# split by 2 column combo
dSplit <- split(d, paste(d$region, d$channel, sep = "_"))
# run lm for each subset
res <- lapply(dSplit, lm, formula = trials ~ spend)
# check names
names(res)
# [1] "a_C" "a_E" "b_D" "b_F"
# lm result for selected combo "a_C"
res$a_C
# Call:
# lm(formula = trials ~ spend, data = i)
#
# Coefficients:
# (Intercept) spend
# 0.3360 0.2444