我在列表中有如下数据框:
CGdfs = [CGdf_2002, CGdf_2003, CGdf_2004, CGdf_2005, CGdf_2006, CGdf_2007, CGdf_2008, CGdf_2009, CGdf_2010, CGdf_2011, CGdf_2012, CGdf_2013, CGdf_2014]
每个数据框中的列是:
CGdf_2002
的列:TSR_df_03_06, board_gender_diversity_percent, gics_sector_name, custom_region
CGdf_2003
的列:TSR_df_04_07, board_gender_diversity_percent, gics_sector_name, custom_region
CGdf_2014
具有列:TSR_df_15_18, board_gender_diversity_percent, gics_sector_name, custom_region
...
我在列表中也有TSR列
TSR3yrdfs_string = ['TSR_df_03_06', 'TSR_df_04_07', 'TSR_df_05_08', 'TSR_df_06_09', 'TSR_df_07_10', 'TSR_df_08_11', 'TSR_df_09_12', 'TSR_df_10_13','TSR_df_11_14', 'TSR_df_12_15','TSR_df_13_16','TSR_df_14_17', 'TSR_df_15_18']
我想使用以下公式在所有这些数据帧上运行回归:
sm.ols(formula = TSR_df_03_06 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2002).fit()
sm.ols(formula = TSR_df_04_07 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2003).fit()
sm.ols(formula = TSR_df_05_08 ~ board_gender_diversity_percent + gics_sector_name + custom_region, data=CGdf_2004).fit()
这些是每个数据帧的不同公式。我想循环运行所有这些回归直到CGdf_2014
。
有人可以给我一个建议吗?
我尝试了以下操作,但语法无效
CGdfs = [CGdf_2002, CGdf_2003, CGdf_2004, CGdf_2005, CGdf_2006, CGdf_2007, CGdf_2008, CGdf_2009, CGdf_2010, CGdf_2011, CGdf_2012, CGdf_2013, CGdf_2014, CGdf_2015, CGdf_2016, CGdf_2017, CGdf_2018]
TSR3yrdfs_string = ['TSR_df_03_06', 'TSR_df_04_07', 'TSR_df_05_08', 'TSR_df_06_09', 'TSR_df_07_10', 'TSR_df_08_11', 'TSR_df_09_12', 'TSR_df_10_13','TSR_df_11_14', 'TSR_df_12_15','TSR_df_13_16','TSR_df_14_17', 'TSR_df_15_18']
for x, y in zip(CGdfs, TSR3yrdfs_string):
results = sm.ols(formula = x[y] ~ x['board_gender_diversity_percent'] + x['gics_sector_name'] + x['custom_region'], data=x).fit()
print('The summary of regression is:', results.summary())
答案 0 :(得分:1)
您需要将formula
作为字符串传递,但是您的formula
有多个列表,例如x[y]
,x['gics_sector_name']
,...和一个不是char / string的元素:~
。
但是您可以像这样重写formula
(为了获得更好的可读性,请使用formula_str
变量:
formula_str = y + '~' + 'board_gender_diversity_percent + gics_sector_name + custom_region'
results = sm.ols(formula=formula_str, data=x).fit()
y
是TSR3yrdfs_string
列表中的一个字符串,而其他列则硬编码为单个字符串。