我想优化代码以使常量分段函数适合我的数据。我有这样的df:
df = data.frame (x = 1:180,
y = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,4,2,2,3,2,1,2,0,1,0,1,4,
0,1,2,3,1,1,1,0,2,0,3,2,1,1,1,1,5,4,2,1,0,2,1,1,2,0,0,2,2,
1,1,1,0,0,0,0,2,3,0,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0))
使用第二个答案here,作者将其称为“蛮力”,我编写了以下代码,将相同的方法应用于960个数据帧:
# Set the possible values of k1 and k2
k1 <- df$x[df$x < df$x[df$y == max(df$y)]]
k2 <- df$x[df$x > df$x[df$y == max(df$y)]]
# get all combinations of k1 and k2
grid <- subset(expand.grid(k1 = k1, k2 = k2), k1 < k2)
store <- numeric(nrow(grid))
for(j in 1:nrow(grid)){
k1 <- grid$k1[j]
k2 <- grid$k2[j]
model <- lm( y ~ I(x < k1) + I(x >= k1 & x < k2) + I(x >= k2),data = df)
# store sigmas for every pair of values
store[j] <- sigma(model)
}
# Get those values which minimize the residual standard deviation
bp <- grid[which.min(store),]
# Look for the minimum sigma (dot in red)
plot(unlist(store),pch =21,bg = "lightblue",
ylab = "Sigma",xlab = "Pairs of k1 and k2")
points(which.min(store),
unlist(store)[which.min(store)],col = "red",pch = 16)
# Fit a model with the best k
fit <- lm(y ~ I(x < bp[1,"k1"]) + I(x >= bp[1,"k1"] & x < bp[1,"k2"]) + I(x >= bp[1,"k2"]),data = df)
plot(df,pch = 21,bg = "orange")
lines(fitted(fit),col = "blue",lwd = 2)
此代码可以正常工作(我在随机数据帧中进行了一些目视检查),但是在我的计算机上,它需要近6个小时才能完成。有人可以提出更好的方法来达到相同的结果吗?如果可以在保持代码可读性的同时做到这一点,那就更好了,但这不是必须的。
预先感谢