Question

我今天在用分段回归帮助一个朋友。我们试图用断点拟合分段回归，以查看它是否比标准线性模型更适合数据。

我偶然发现了一个我无法理解的问题。当使用提供的数据对具有单个断点的分段回归进行拟合时，确实确实适合单个断点。

但是，当您从模型进行预测时，它会给出看起来像2个断点的样子。使用plot.segmented()绘制模型时，不会发生此问题。

任何人都知道发生了什么事以及如何获得正确的预测（以及标准错误等）？还是我在一般代码中做错了什么？

# load packages
library(segmented)

# make data
d <- data.frame(x = c(0, 3, 13, 18, 19, 19, 26, 26, 33, 40, 49, 51, 53, 67, 70, 88
),
                y = c(0, 3.56211608128595, 10.5214485148819, 3.66063708049802, 6.11000808621074, 
                      5.51520423804034, 7.73043895812661, 7.90691392857039, 6.59626527933846, 
                      10.4413913666936, 8.71673928545967, 9.93374157928462, 1.214860139929, 
                      3.32428882257746, 2.65223361387063, 3.25440939462105))

# fit normal linear regression and segmented regression
lm1 <- lm(y ~ x, d)
seg_lm <- segmented(lm1, ~ x)

slope(seg_lm)
#> $x
#>            Est.  St.Err. t value CI(95%).l   CI(95%).u
#> slope1  0.17185 0.094053  1.8271 -0.033079  0.37677000
#> slope2 -0.15753 0.071933 -2.1899 -0.314260 -0.00079718

# make predictions
preds <- data.frame(x = d$x, preds = predict(seg_lm))

# plot segmented fit
plot(seg_lm, res = TRUE)

# plot predictions
lines(preds$preds ~ preds$x, col = 'red')

由reprex package（v0.2.0）于2018-07-27创建。

Answer 1

这是一个纯粹的绘图问题。

#Call: segmented.lm(obj = lm1, seg.Z = ~x)
#
#Meaningful coefficients of the linear terms:
#(Intercept)            x         U1.x  
#     2.7489       0.1712      -0.3291  
#
#Estimated Break-Point(s):
#psi1.x  
# 37.46

估计断点位于x = 37.46，它不是任何采样位置：

d$x
# [1]  0  3 13 18 19 19 26 26 33 40 49 51 53 67 70 88

如果在这些采样位置上用拟合值制作图，

preds <- data.frame(x = d$x, preds = predict(seg_lm))
lines(preds$preds ~ preds$x, col = 'red')

由于lines仅将拟合值一个接一个地排列，因此您不会在视觉上看到那些拟合的两个线段在断点处合并。 plot.segmented会注意断点并做出正确的绘图。

尝试以下操作：

## the fitted model is piecewise linear between boundary points and break points
xp <- c(min(d$x), seg_lm$psi[, "Est."], max(d$x))
yp <- predict(seg_lm, newdata = data.frame(x = xp))

plot(d, col = 8, pch = 19)  ## observations
lines(xp, yp)  ## fitted model
points(d$x, seg_lm$fitted, pch = 19)  ## fitted values
abline(v = d$x, col = 8, lty = 2)  ## highlight sampling locations