Question

如何打印所有因子水平的glm系数，包括参考水平？ summary（glm_obj）仅打印偏离参考值的值。

我知道那些是0，但我需要这个用于集成，即告诉其他系统可能发生什么因素水平。

很抱歉，如果它太简单，找不到任何地方。

由于

说明我面临的问题：

> glm(Petal.Width~Species,data=iris)  

Call:  glm(formula = Petal.Width ~ Species, data = iris)  

Coefficients:
          (Intercept)  Speciesversicolor   Speciesvirginica  
                0.246              1.080              1.780  

Degrees of Freedom: 149 Total (i.e. Null);  147 Residual
Null Deviance:      86.57 
Residual Deviance: 6.157    AIC: -45.29`

上面的模型描述仅产生了云芝和维吉尼亚的系数，正如Dason所指出的那样，从模型本身的角度来看，它是绝对精细的。

但是，我需要与另一个应用程序共享该模型，该应用程序必须知道所期望的物种等级（例如，一旦出现新的未经研究的等级，就会发出警告）。

Summary（）给出了相同的结果：

> summary(glm(Petal.Width~Species,data=iris))

Call:
glm(formula = Petal.Width ~ Species, data = iris)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.626  -0.126  -0.026   0.154   0.474  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.24600    0.02894    8.50 1.96e-14 ***
Speciesversicolor  1.08000    0.04093   26.39  < 2e-16 ***
Speciesvirginica   1.78000    0.04093   43.49  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.04188163)

Null deviance: 86.5699  on 149  degrees of freedom
Residual deviance:  6.1566  on 147  degrees of freedom
AIC: -45.285

Number of Fisher Scoring iterations: 2

Answer 1

您可以重新编写summary.glm方法。您可以通过在控制台中键入summary.glm来查看其来源，也可以先使用sink将源转储到文件中。大多数显示方法都是用R本身编写的，因此您应该能够浏览代码并在必要时添加一行。

或者，您可以为参考级别定义一个额外的虚拟变量，并将其添加到模型中。 R会给你一个警告并将系数设置为NA。例如：

 # no coefficient for the reference level
l = lm(Sepal.Width~Species,iris)

# make a dummy for the reference level
iris$Speciessetosa = as.numeric(iris$Species == "setosa")

# you get NA for the coefficient on new dummy
l = lm(Sepal.Width~Species+Speciessetosa,iris)

不幸的是，你不能只设置l$coefficients[4] = 0，因为它不会出现在打印方法中。从源代码中可以清楚地看出这不起作用的原因，我建议略过它。

如果您真的需要0而不是NA，则可以通过sed运行输出，将NA更改为相关行中的0，甚至将输出保存为R character向量并使用内置的gsub函数，或者如果只有少数这些函数，则手动更改，sink输出为文件并使用R编辑器中的查找和替换功能或Word或Sublime等编辑器。

Answer 2

所以我意识到这个问题很老，但一个简单的解决方案是使用dummy.coef函数

<?php
$start = date_create_from_format("m/Y","2/2016")->modify("first day of this month");
$end = date_create_from_format("m/Y","11/2017")->modify("first day of this month");

$timespan = date_interval_create_from_date_string("1 month");

$months = [];
$years = [];

while ($start <= $end) {
    $months[] = $start->format("m");
    $years[] = $start->format("Y");
    $start = $start->add($timespan);

}

print_r([ $months, $years ]);

我希望这有帮助！

Answer 3

回答我自己，因为我认为这种方法比提出的方法更符合目的。

共享预测模型不是summary.glm方法应该做的事情，因此摘要（模型）没有说明模型应用于的数据。

虽然有一个解决方案 - use PMML，它允许描述模型和它应该应用的数据。

示例：

> library(pmml)
> pmml(glm(Petal.Width~Species,data=iris))
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2014 dmitrijsl" description="Generalized Linear     Regression Model">
  <Extension name="user" value="dmitrijsl" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-07-15 15:07:51</Timestamp>
 </Header>
 <DataDictionary numberOfFields="2">
  <DataField name="Petal.Width" optype="continuous" dataType="double"/>
  <DataField name="Species" optype="categorical" dataType="string">
   <Value value="setosa"/>
   <Value value="versicolor"/>
   <Value value="virginica"/>
  </DataField>
...

现在Setosa也在接收器系统的列表中，知道会发生什么，模型说明就在那里：

...
<ParameterList>
 <Parameter name="p0" label="(Intercept)"/>
 <Parameter name="p1" label="Speciesversicolor"/>
 <Parameter name="p2" label="Speciesvirginica"/>
</ParameterList>
<FactorList>
 <Predictor name="Species"/>
</FactorList>
<CovariateList/>
<PPMatrix>
 <PPCell value="versicolor" predictorName="Species" parameterName="p1"/>
 <PPCell value="virginica" predictorName="Species" parameterName="p2"/>
</PPMatrix>
<ParamMatrix>
 <PCell parameterName="p0" df="1" beta="0.245999999999997"/>
 <PCell parameterName="p1" df="1" beta="1.08"/>
 <PCell parameterName="p2" df="1" beta="1.78"/>
</ParamMatrix>

在R中打印所有glm系数

3 个答案: