Question

我有16 dataframes：CGdf_2001, CGdf_2002 so on till CGdf_2016。我想在循环中对这些数据框运行回归。该怎么办？

CGdf_2001具有列TSR_2001和sector profit
CGdf_2002具有列TSR_2002和sector profit

以此类推。

我的回归公式是

TSR_2001 ~ sector profit, data = CGdf_2001

我想一次对所有数据帧运行此公式

Answer 1

注意：是python3的示例，但是您可以删除打印语句 '(' & ')'并在python2中进行测试。

下面是一个示例，它完成您的任务，生成dataframes，然后将它们放在列表中，下面是代码和结果。在您的情况下，如果变量不同，您将需要对公式进行参数设置，因为您没有提供数据样本，而我只提供了具有相同名称的变量/列的答案。 TSR_2001 ~ sector profit, data = CGdf_2001对您来说可能是：

sm.ols(formula=df[x].iloc[:,0] ~ df[x].iloc[:,1] df[x].iloc[:,2], data=df[x]).fit()

OR

sm.ols(formula=df[x].iloc[:,0] ~ df[x].iloc['sector'] df[x].iloc['profit'], data=df[x]).fit()

df[x].iloc[:,0]＃数据帧的第一列可能是
{{1}代表TSR_2001，dataframe one代表TSR_200n等。
dataframe n＃第二列df[x].iloc[:,1]＃第三列

这里是一个例子：

df[x].iloc[:,2]

出局：

import pandas as pd
import statsmodels.formula.api as sm

#define dataframes, data for the model
df1 = pd.DataFrame({"A": [10, 20, 30, 40, 50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})
df2 = pd.DataFrame({"A": [10, 3, 30, 40, 50], "B": [
                  20, 30, 10, 40, 50], "C": [32, 3, 23, 23, 42523]})
df3 = pd.DataFrame({"A": [10, 5, 30, 40, 50], "B": [
                  20, 5, 10, 40, 50], "C": [32, 5, 23, 23, 42523]})

# generate the list of dataframes
df = [df1, df2, df3]

#sores the results of the model
results = {}

#itterate and generate the putput for each model
for x in range(1, len(df)):
    results[x] = sm.ols(formula="A ~ B + C", data=df[x]).fit()
    print('Parameters for each reg model are:', results[x].params)
    print()
    print('The summary of regressionis:', results[x].summary())

在Python中为多个数据框循环运行回归

1 个答案: