R中参数估计的一般线性模型解释

时间:2018-03-08 08:31:30

标签: r statistics linear-regression glm

我的数据集看起来像

    "","OBSERV","DIOX","logDIOX","OXYGEN","LOAD","PRSEK","PLANT","TIME","LAB"
"1",1011,984.06650389,6.89169348002254,"L","H","L","RENO_N","1","KK"
"2",1022,1790.7973641,7.49041625445373,"H","H","L","RENO_N","1","USA"
"3",1031,661.95870145,6.4952031694744,"L","H","H","RENO_N","1","USA"
"4",1042,978.06853583,6.88557974511529,"H","H","H","RENO_N","1","KK"
"5",1051,270.92290942,5.60183431332639,"N","N","N","RENO_N","1","USA"
"6",1062,402.98269729,5.99889362626069,"N","N","N","RENO_N","1","USA"
"7",1071,321.71945701,5.77367991426247,"H","L","L","RENO_N","1","KK"
"8",1082,223.15260359,5.40785585845064,"L","L","L","RENO_N","1","USA"
"9",1091,246.65350151,5.507984523849,"H","L","H","RENO_N","1","USA"
"10",1102,188.48323034,5.23900903921703,"L","L","H","RENO_N","1","KK"
"11",1141,267.34994025,5.58855843790491,"N","N","N","RENO_N","1","KK"
"12",1152,452.10355987,6.11391126834609,"N","N","N","RENO_N","1","KK"
"13",2011,2569.6672555,7.85153169693888,"N","N","N","KARA","1","USA"
"14",2021,604.79620572,6.40489155123453,"N","N","N","KARA","1","KK"
"15",2031,2610.4804449,7.86728956188212,"L","H",NA,"KARA","1","KK"
"16",2032,3789.7097503,8.24004471210954,"L","H",NA,"KARA","1","USA"
"17",2052,338.97054188,5.82591320649553,"L","L","L","KARA","1","KK"
"18",2061,391.09027375,5.96893841249289,"H","L","H","KARA","1","USA"
"19",2092,410.04420258,6.01626496505788,"N","N","N","KARA","1","USA"
"20",2102,313.51882368,5.74785940190679,"N","N","N","KARA","1","KK"
"21",2112,1242.5931417,7.12495571830002,"H","H","H","KARA","1","KK"
"22",2122,1751.4827969,7.46821802066524,"H","H","L","KARA","1","USA"
"23",3011,60.48026048,4.10231703874031,"N","N","N","RENO_S","1","KK"
"24",3012,257.27729731,5.55015448107691,"N","N","N","RENO_S","1","USA"
"25",3021,46.74282552,3.84466077914493,"N","N","N","RENO_S","1","KK"
"26",3022,73.605375516,4.29871805996994,"N","N","N","RENO_S","1","KK"
"27",3031,108.25433812,4.68448344109116,"H","H","L","RENO_S","1","KK"
"28",3032,124.40704234,4.82355878915293,"H","H","L","RENO_S","1","USA"
"29",3042,123.66859296,4.81760535031397,"L","H","L","RENO_S","1","KK"
"30",3051,170.05332632,5.13611207209694,"N","N","N","RENO_S","1","USA"
"31",3052,95.868704018,4.56297958887925,"N","N","N","RENO_S","1","KK"
"32",3061,202.69261215,5.31169060558111,"N","N","N","RENO_S","1","USA"
"33",3062,70.686307069,4.25825187761015,"N","N","N","RENO_S","1","USA"
"34",3071,52.034715526,3.95191110210073,"L","H","H","RENO_S","1","KK"
"35",3072,93.33525462,4.53619789950355,"L","H","H","RENO_S","1","USA"
"36",3081,121.47464906,4.79970559129829,"H","H","H","RENO_S","1","USA"
"37",3082,94.833869239,4.55212661590867,"H","H","H","RENO_S","1","KK"
"38",3091,68.624596439,4.22865101914209,"H","L","L","RENO_S","1","USA"
"39",3092,64.837097371,4.17187792984139,"H","L","L","RENO_S","1","KK"
"40",3101,32.351569811,3.47666254561192,"L","L","L","RENO_S","1","KK"
"41",3102,29.285124102,3.37707967726539,"L","L","L","RENO_S","1","USA"
"42",3111,31.36974463,3.44584388158928,"L","L","H","RENO_S","1","USA"
"43",3112,28.127853881,3.33676032670116,"L","L","H","RENO_S","1","KK"
"44",3121,91.825330102,4.51988818660262,"H","L","H","RENO_S","1","KK"
"45",3122,136.4559307,4.91600171048243,"H","L","H","RENO_S","1","USA"
"46",4011,126.11889968,4.83722511024933,"H","L","H","RENO_N","2","KK"
"47",4022,76.520259821,4.33755554003153,"L","L","L","RENO_N","2","KK"
"48",4032,93.551979795,4.53851721545715,"L","L","H","RENO_N","2","USA"
"49",4041,207.09703422,5.33318744777751,"H","L","L","RENO_N","2","USA"
"50",4052,383.44185307,5.94918798759058,"N","N","N","RENO_N","2","USA"
"51",4061,156.79345897,5.05492939129363,"N","N","N","RENO_N","2","USA"
"52",4071,322.72413197,5.77679787769979,"L","H","L","RENO_N","2","USA"
"53",4082,554.05710342,6.31726775620079,"H","H","H","RENO_N","2","USA"
"54",4091,122.55552697,4.80856420867156,"N","N","N","RENO_N","2","KK"
"55",4102,112.70050456,4.72473389805434,"N","N","N","RENO_N","2","KK"
"56",4111,94.245481423,4.54590288271731,"L","H","H","RENO_N","2","KK"
"57",4122,323.16498582,5.77816298482521,"H","H","L","RENO_N","2","KK"

我使用R作为

lm中定义了一个线性模型
lm1 <- lm(logDIOX ~ 1 + OXYGEN + LOAD + PLANT + TIME + LAB, data=data)

我想解释估计的系数。然而,当我提取系数时,我得到多个'NA'(我假设它是由于变量之间的线性相关性)。那我怎么解释这些系数呢?我只有一个截距,它以某种方式表示模型中每个包含因子的一个级别。是否有可能对每个因子水平进行估算?

> summary(lm1)

Coefficients:

    Call:
lm(formula = logDIOX ~ OXYGEN + LOAD + PLANT + TIME + LAB, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90821 -0.32102 -0.08993  0.27311  0.97758 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.2983     0.2110  34.596  < 2e-16 ***
OXYGENL      -0.4086     0.1669  -2.449 0.017953 *  
OXYGENN      -0.7567     0.1802  -4.199 0.000113 ***
LOADL        -1.0645     0.1675  -6.357 6.58e-08 ***
LOADN             NA         NA      NA       NA    
PLANTRENO_N  -0.6636     0.2174  -3.052 0.003664 ** 
PLANTRENO_S  -2.3452     0.1929 -12.158  < 2e-16 ***
TIME2        -0.9160     0.2065  -4.436 5.18e-05 ***
LABUSA        0.3829     0.1344   2.849 0.006392 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5058 on 49 degrees of freedom
Multiple R-squared:  0.8391,    Adjusted R-squared:  0.8161 
F-statistic:  36.5 on 7 and 49 DF,  p-value: < 2.2e-16

1 个答案:

答案 0 :(得分:0)

对于你问题的NA部分,你可以看一下:

[linear regression "NA" estimate just for last coefficient,实际上您的变量可以描述为其余变量的线性组合。

对于因子及其水平,r的工作方式是显示第一因子水平的截距,并显示截距与其他因子的差异。我认为只有一个因素回归会更清楚:

lm1 <- lm(logDIOX ~ 1 + OXYGEN , data=df)
> summary(lm1)

Call:
lm(formula = logDIOX ~ 1 + OXYGEN, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7803 -0.7833 -0.2027  0.6597  3.1229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.5359     0.2726  20.305   <2e-16 ***
OXYGENL      -0.4188     0.3909  -1.071    0.289    
OXYGENN      -0.1896     0.3807  -0.498    0.621    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.188 on 54 degrees of freedom
Multiple R-squared:  0.02085,   Adjusted R-squared:  -0.01542 
F-statistic: 0.5749 on 2 and 54 DF,  p-value: 0.5662

这个结果说的是 OXYGEN="H"截距为5.5359,OXYGEN="L"截距为5.5359-0.4188 = 5.1171,OXYGEN="N"截距为5.5359-0.1896 = 5.3463。

希望这有帮助

更新:

根据您的评论,我会向您的模型推广。

何时

OXYGEN = "H"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"

然后:

logDIOX =7.2983

 OXYGEN = "L"
    LOAD = "H"
    PLANT= "KARRA"
    TIME=1
    LAB="KK"

然后:

logDIOX =7.2983-0.4086 =6.8897

 OXYGEN = "L"
    LOAD = "L"
    PLANT= "KARRA"
    TIME=1
    LAB="KK"

然后:

logDIOX =7.2983-0.4086-1.0645 =5.8252