Question

我想优化代码以使常量分段函数适合我的数据。我有这样的df：

df = data.frame (x = 1:180,
                 y = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,
                       0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,4,2,2,3,2,1,2,0,1,0,1,4,
                       0,1,2,3,1,1,1,0,2,0,3,2,1,1,1,1,5,4,2,1,0,2,1,1,2,0,0,2,2,
                       1,1,1,0,0,0,0,2,3,0,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                       0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                       0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                       0,0,0,0,0,0))

使用第二个答案here，作者将其称为“蛮力”，我编写了以下代码，将相同的方法应用于960个数据帧：

# Set the possible values of k1 and k2
k1 <- df$x[df$x < df$x[df$y == max(df$y)]]
k2 <-  df$x[df$x > df$x[df$y == max(df$y)]]
# get all combinations of k1 and k2
grid <- subset(expand.grid(k1 = k1, k2 = k2), k1 < k2)

store <- numeric(nrow(grid))

for(j in 1:nrow(grid)){
  k1 <- grid$k1[j]
  k2 <- grid$k2[j]
  model <- lm( y ~ I(x < k1) + I(x >= k1 & x < k2) + I(x >= k2),data = df)
# store sigmas for every pair of values
  store[j] <-  sigma(model)
}
# Get those values which minimize the residual standard deviation
bp <- grid[which.min(store),]

# Look for the minimum sigma (dot in red)
plot(unlist(store),pch =21,bg = "lightblue",
     ylab = "Sigma",xlab = "Pairs of k1 and k2")
points(which.min(store),
       unlist(store)[which.min(store)],col = "red",pch = 16)

# Fit a model with the best k

fit <- lm(y ~ I(x < bp[1,"k1"]) + I(x >= bp[1,"k1"] & x < bp[1,"k2"]) + I(x >= bp[1,"k2"]),data = df)

plot(df,pch = 21,bg = "orange")
lines(fitted(fit),col = "blue",lwd = 2)

此代码可以正常工作（我在随机数据帧中进行了一些目视检查），但是在我的计算机上，它需要近6个小时才能完成。有人可以提出更好的方法来达到相同的结果吗？如果可以在保持代码可读性的同时做到这一点，那就更好了，但这不是必须的。

预先感谢

针对许多数据帧优化R中的分段常数拟合

0 个答案: