我使用statsmodels创建一些回归输出:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
import numpy as np
import pandas as pd
x1 = pd.Series(np.random.randn(2000))
x2 = pd.Series(np.random.randn(2000))
aa_milne_arr = ['a', 'b', 'c', 'd', "e", "f", "g", "h", "i"]
dummy = pd.Series(np.random.choice(aa_milne_arr, 2000,))
depen = pd.Series(np.random.randn(2000))
df = pd.DataFrame({"y": depen, "x1": x1, "x2": x2, "dummy": dummy})
df['const'] = 1
df['xsqr'] = df['x1']**2
mod = smf.ols('y ~ x1 + x2 + dummy', data=df)
mod2 = smf.ols('y ~ x1 + x2 + xsqr + dummy', data=df)
res = mod.fit()
res2 = mod2.fit()
print (summary_col([res,res2],stars=True,float_format='%0.3f',
model_names=['one\n(0)','two\n(1)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)}))
它工作得很好,但我有一个包含许多虚拟对象的大数据集(比示例中的方式更多)。因此,我想从摘要输出中排除虚拟变量(而不是从回归本身中排除)。它在某种程度上可能吗?
答案 0 :(得分:1)
快速而肮脏的方法是首先在最终dummy
中找到这些summary_col
索引,然后避免打印它们:
summary = summary_col(
[res,res2],stars=True,float_format='%0.3f',
model_names=['one\n(0)','two\n(1)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)})
# As string
# summary_str = str(summary).split('\n')
# LaTeX format
summary_str = summary.as_latex().split('\n')
# Find dummy indexes
dummy_idx = []
for i, li in enumerate(summary_str):
if li.startswith('dummy'):
dummy_idx.append(i)
dummy_idx.append(i + 1)
# Print summary avoiding dummy indexes
for i, li in enumerate(summary_str):
if i not in dummy_idx:
print(li)
它不漂亮,但它有效。使用字符串格式:
==========================
one two
(0) (1)
--------------------------
Intercept 0.029 -0.000
(0.065) (0.068)
x1 0.023 0.025
(0.022) (0.022)
x2 -0.014 -0.014
(0.022) (0.022)
xsqr 0.024
(0.016)
N 2000 2000
R2 0.00 0.00
==========================
Standard errors in
parentheses.
* p<.1, ** p<.05, ***p<.01
使用LaTeX格式:
\begin{table}
\caption{}
\begin{center}
\begin{tabular}{lcc}
\hline
& one & two \\
& (0) & (1) \\
\hline
\hline
\end{tabular}
\begin{tabular}{lll}
Intercept & 0.070 & 0.067 \\
& (0.069) & (0.071) \\
x1 & 0.001 & 0.001 \\
& (0.022) & (0.022) \\
x2 & -0.024 & -0.025 \\
& (0.022) & (0.022) \\
xsqr & & 0.003 \\
& & (0.015) \\
N & 2000 & 2000 \\
R2 & 0.01 & 0.01 \\
\hline
\end{tabular}
\end{center}
\end{table}
答案 1 :(得分:0)
我会在 " WHERE dni_competidor = '"+wher_combo.getSelectedItem().toString()+"' ";
中使用 regressor_order
参数,它允许您指定首先显示哪些回归量(如果指定 summary_col
,则完全省略)。
示例:
drop_omitted=True