我要做的是使用statsmodels.api对数据帧的所有可能成对列组合应用线性回归。
我能够为以下代码执行此操作:
对于数据框 df :
import statsmodels.api as sm
import numpy as np
import pandas as pd
#generate example Dataframe
df = pd.DataFrame(abs(np.random.randn(50, 4)*10), columns=list('ABCD'))
#extract all possible combinations of columns by column index number
i, j = np.tril_indices(df.shape[1], -1)
#generate a for loop that creates the variable an run the regression on each pairwise combination
for idx,item in enumerate(list(zip(i, j))):
exec("model" + str(idx) +" = sm.OLS(df.iloc[:,"+str(item[0])+"],df.iloc[:,"+str(item[1])+"])")
exec("regre_result" + str(idx) +" = model" + str(idx)+".fit()")
regre_result0.summary()
OLS Regression Results
Dep. Variable: B R-squared: 0.418
Model: OLS Adj. R-squared: 0.406
Method: Least Squares F-statistic: 35.17
Date: Tue, 09 Jan 2018 Prob (F-statistic): 3.00e-07
Time: 14:16:25 Log-Likelihood: -174.29
No. Observations: 50 AIC: 350.6
Df Residuals: 49 BIC: 352.5
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
A 0.7189 0.121 5.930 0.000 0.475 0.962
Omnibus: 14.290 Durbin-Watson: 1.828
Prob(Omnibus): 0.001 Jarque-Bera (JB): 16.289
Skew: 1.101 Prob(JB): 0.000290
Kurtosis: 4.722 Cond. No. 1.00
它有效,但我想有一种更容易实现类似结果的方法,任何人都可以指出实现它的最佳方法吗?
答案 0 :(得分:1)
为什么用exec和大量变量这样做而不只是附加到列表?
您还可以使用itertools.combinations
获取所有列对。
尝试这样的事情:
In [1]: import itertools
In [2]: import pandas as pd
In [3]: daf = pd.DataFrame(columns=list('ABCD'))
In [4]: list(itertools.combinations(daf.columns, 2))
Out[4]: [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
In [6]: col_pairs = list(itertools.combinations(daf.columns, 2))
In [6]: models = []
In [7]: results = []
In [8]: for a,b in col_pairs:
...: model = get_model(df[a],df[b])
...: models.append(model)
...: result = get_result(model)
...: results.append(result)
In [9]: results[0].summary()
get_model
将调用sm.OLS
,而get_result
会调用fit
(或者只是在这里调用它们而不将它们放在外部函数中。但是不要这么做exec way - best practice is to avoid using it)。