我有一些类似的高尔夫数据:
ID round GIR score
Tiger Woods 3 0.666666667 68
Tiger Woods 2 0.611111111 75
Tiger Woods 1 0.666666667 71
Adam Scott 3 0.611111111 68
Adam Scott 2 0.888888889 68
Adam Scott 1 0.666666667 66
我试图制作一个线性模型,根据规则中的果岭来说,我的第四轮得分就是这样。"到目前为止,这是我的剧本。
#load in data
gir2 <- read.csv("girforscore.csv")
#establish linear model
fit <- lm(score ~ GIR * ID, data = gir2)
#apply linear model
lmresultsGIR <- setNames(predict(fit, newdata = data.frame(ID = unique(gir2$ID), GIR = .6111111)),
unique(gir2$ID))
#show model
head(lmresultsGIR, n=10)
我的问题是,假设我有第4轮GIR数据:
ID round GIR
Tiger Woods 4 0.666666667
Tiger Woods 4 0.611111111
如何更新我的脚本以按ID选择第4轮GIR数据,而不是像我现在那样硬编码.6111111
的魔术值?
答案 0 :(得分:1)
试着试一试。
#load in data
gir2 <- read.csv("girforscore.csv")
#establish linear model
model <- na.omit(gir2)
fit <- lm(score ~ ID + GIR, data = model)
#subset data for round 4
round4 <- subset(gir2, round == 4)
#apply linear model
predict <- predict(fit, newdata = round4, se.fit = TRUE)
#easier than setNames for this particular example
round4$score <- predict$fit
#view round 4 predicted scores
round4
ID round GIR score
7 Tiger Woods 4 0.6666667 71.29545
8 Tiger Woods 4 0.6111111 71.40909