Question

假设我有一个名为wage的数据集，如下所示：

 wage
# A tibble: 935 x 17
    wage hours    iq   kww  educ exper tenure   age married  black  south  urban  sibs brthord meduc
   <int> <int> <int> <int> <int> <int>  <int> <int>  <fctr> <fctr> <fctr> <fctr> <int>   <int> <int>
 1   769    40    93    35    12    11      2    31       1      0      0      1     1       2     8
 2   808    50   119    41    18    11     16    37       1      0      0      1     1      NA    14
 3   825    40   108    46    14    11      9    33       1      0      0      1     1       2    14
 4   650    40    96    32    12    13      7    32       1      0      0      1     4       3    12
 5   562    40    74    27    11    14      5    34       1      0      0      1    10       6     6
 6  1400    40   116    43    16    14      2    35       1      1      0      1     1       2     8
 7   600    40    91    24    10    13      0    30       0      0      0      1     1       2     8
 8  1081    40   114    50    18     8     14    38       1      0      0      1     2       3     8
 9  1154    45   111    37    15    13      1    36       1      0      0      0     2       3    14
10  1000    40    95    44    12    16     16    36       1      0      0      1     1       1    12
# ... with 925 more rows, and 2 more variables: feduc <int>, lwage <dbl>

说我然后看一个简单的线性回归btw工资和智商：

m_wage_iq = lm(wage ~ iq, data = wage)
m_wage_iq$coefficients

给了我：

## (Intercept)          iq 
##  116.991565    8.303064

我想检查错误是：

ϵi∼N(0,σ2)

如何使用R？

进行检查

Answer 1

您可以通过多种方式尝试。

一种方法是shapiro.test来测试正态性。大于你的α水平p.value（通常高达10％）将意味着不能拒绝零假设（即，错误是正态分布的）。但是，测试会因样本量而有所偏差，因此您可能希望通过查看QQplot来强化您的结果。

您可以通过绘制m_wage_iq（plot(m_wage_iq )）并查看第二张图表来查看。如果您的点大致位于x = y线上，则表明误差遵循正态分布。

使用R确定错误是否正常分布：

1 个答案: