这个问题是关于GLM打印结果的方式,即打印系数的顺序。通过"命令"我没有提到这个术语的任何统计含义。
from pandas import *
import statsmodels.api as sm
import patsy as patsy
df = read_csv("http://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv")
y, X = patsy.dmatrices( 'price ~ cut', data = df )
sm.GLM( y, X, family= sm.families.Gaussian() ).fit().summary()
...并生成下面的输出,其中订购了类别:
<公平),良好,理想,优质,非常好====================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------------
Intercept 4358.7578 98.788 44.122 0.000 4165.137 4552.379
cut[T.Good] -429.8933 113.849 -3.776 0.000 -653.034 -206.753
cut[T.Ideal] -901.2158 102.412 -8.800 0.000 -1101.939 -700.493
cut[T.Premium] 225.4999 104.395 2.160 0.031 20.889 430.111
cut[T.Very Good] -376.9979 105.164 -3.585 0.000 -583.116 -170.880
====================================================================================
我希望他们的订购方式如下:
<公平),良好,非常好,高级,理想df = read.table( file = "http://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv",
sep = ",", header = TRUE)
df$cut = factor( df$cut, levels = c("Fair", "Good", "Very Good", "Premium", "Ideal"))
glm( price ~ cut, data = df, family = gaussian )
请注意,输出中的顺序遵循因子排序:
<公平),良好,非常好,高级,理想Call: glm(formula = price ~ cut, family = gaussian, data = df)
Coefficients:
(Intercept) cutGood cutVery Good cutPremium cutIdeal
4358.8 -429.9 -377.0 225.5 -901.2
我如何在Python中执行此操作?