Question

随机正常日期的简单回归失败，但是使用小整数而不是日期的相同数据按预期工作。

# Example dataset with 100 observations at 2 second intervals.
set.seed(1)
df <- data.frame(x=as.POSIXct("2017-03-14 09:00:00") + seq(0, 199, 2),
                 y=rnorm(100))

#> head(df)
#                     x          y
# 1 2017-03-14 09:00:00 -0.6264538
# 2 2017-03-14 09:00:02  0.1836433
# 3 2017-03-14 09:00:04 -0.8356286

# Simple regression model.
m <- lm(y ~ x, data=df)

由于数据中的奇点，斜率缺失。调用摘要证明了这一点：

summary(m)

# Coefficients: (1 not defined because of singularities)
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  0.10889    0.08982   1.212    0.228
# x                 NA         NA      NA       NA

这可能是因为POSIXct类？

# Convert date variable to integer.
df$x2 <- as.integer(df$x)
lm(y ~ x2, data=df)

# Coefficients:
# (Intercept)           x2  
#      0.1089           NA

不，x2的系数仍然缺失。

如果我们将x2的基线设为零，该怎么办？

# Subtract minimum of x.
df$x3 <- df$x2 - min(df$x2)
lm(y ~ x3, data=df)

# Coefficients:
# (Intercept)           x3  
#   0.1312147   -0.0002255

这有效！

另一个例子来排除这是由于datetime变量。

# Subtract large constant from date (data is now from 1985).
df$x4 <- df$x - 1000000000
lm(y ~ x4, data=df)

# Coefficients:
# (Intercept)           x4  
#   1.104e+05   -2.255e-04

不期望（为什么具有30年差异的相同数据集导致不同的行为？），但这也有效。

可能是.Machine$integer.max（我电脑上的2147483647）与它有关，但我无法弄明白。如果有人能解释这里发生了什么，我们将不胜感激。

Answer 1

是的，它可以。 QR分解是稳定的，但不是万能的上帝。

X <- cbind(1, 1e+11 + 1:10000)
qr(X)$rank
# 1

这里X就像你的线性回归模型的模型矩阵，其中有一个全1列用于拦截，并且有一个日期时间序列（注意大偏移量）。

如果您将日期时间列居中，这两列将是正交，因此非常稳定（即使直接求解正规方程！）。

由于R中的大整数日期时间，线性模型是奇异的？

1 个答案: