ValueError:Endog和Exog矩阵的大小不同-如何仅在特定列中删除数据?

时间:2018-12-08 15:55:47

标签: python pandas dataframe regression

我正在尝试运行多变量回归并得到错误:

“ ValueError:endog和exog矩阵的大小不同”

我的代码段如下:

df_raw = pd.DataFrame(data=df_raw)

y = (df_raw['daily pct return']).astype(float)
x1 = (df_raw['Excess daily return']).astype(float)
x2 = (df_raw['Excess weekly return']).astype(float)
x3 = (df_raw['Excess monthly return']).astype(float)
x4 = (df_raw['Trading vol / mkt cap']).astype(float)
x5 = (df_raw['Std dev']).astype(float)
x6 = (df_raw['Residual risk']).astype(float)

y = y.replace([np.inf, -np.inf],np.nan).dropna()

print(y.shape)
print(x1.shape)
print(x2.shape)
print(x3.shape)
print(x4.shape)
print(x5.shape)
print(x6.shape)


df_raw.to_csv('Raw_final.csv', header=True)

result = smf.OLS(exog=y, endog=[x1, x2, x3, x4, x5, x6]).fit()
print(result.params)
print(result.summary())

从我的代码中可以看到,我正在检查每个变量的“形状”。我得到以下输出,该错误指示原因是y变量只有48392个值,而所有其他变量都有48393个值:

(48392,) (48393,) (48393,) (48393,) (48393,) (48393,) (48393,)

我的数据框如下所示:

  daily pct return | Excess daily return | weekly pct return | index weekly pct return | Excess weekly return | monthly pct return | index monthly pct return | Excess monthly return | Trading vol / mkt cap |   Std dev   
 ------------------|---------------------|-------------------|-------------------------|----------------------|--------------------|--------------------------|-----------------------|-----------------------|------------- 
                   |                     |                   |                         |                      |                    |                          |                       |           0.207582827 |             
       0.262658228 |         0.322397801 |                   |                         |                      |                    |                          |                       |           0.285585677 |             
       0.072681704 |         0.126445534 |                   |                         |                      |                    |                          |                       |           0.272920624 |             
       0.135514019 |         0.068778682 |                   |                         |                      |                    |                          |                       |           0.213149083 |             
      -0.115226337 |        -0.173681889 |                   |                         |                      |                    |                          |                       |           0.155653699 |             
      -0.165116279 |        -0.176569405 |                   |                         |                      |                    |                          |                       |           0.033925024 |             
       0.125348189 |         0.079889239 |                   |                         |                      |                    |                          |                       |           0.030968484 | 0.544133212 
       0.022277228 |        -0.044949678 |                   |                         |                      |                    |                          |                       |           0.020735381 | 0.385659608 
       0.150121065 |         0.102119782 |                   |                         |                      |                    |                          |                       |           0.063563881 | 0.430868447 
       0.336842105 |         0.333590483 |                   |                         |                      |                    |                          |                       |           0.210193049 | 0.893734807 
       0.011023622 |        -0.011860658 |       0.320987654 |            -0.657089012 |          0.978076666 |                    |                          |                       |           0.100468109 | 1.137976483 
        0.37694704 |         0.308505907 |                   |                         |                      |                    |                          |                       |           0.135828281 | 1.867394416 

有人能解决矩阵大小的问题吗,这样我就不再收到此错误了?我想我需要从y变量(“每日pct返回”)中删除值APART的第一行,但是我不确定如何实现此目标?

提前谢谢!

2 个答案:

答案 0 :(得分:0)

我假设您想丢弃与y值无穷大相关的所有数据。

df_raw = pd.DataFrame(data=df_raw)

df_raw['daily pct return']) = df_raw['daily pct return']).astype(float).replace([np.inf, -np.inf],np.nan)
df_raw = df_raw.dropna()

然后继续进行回归。

答案 1 :(得分:0)

终于解决了这个问题!有三个问题:

1)y变量的大小为48392,而其他6个变量的大小均为48393。为解决此问题,我添加了以下代码行以删除第一行:

var toyota = {
  make: "Toyota",
  model: "Corolla",
  fuel: 0,
  tank: function(addingfuel) {
    this.fuel = this.fuel + parseInt(addingfuel);
  },
  start: function() {
    if (this.fuel === 0) {
      alert("stop");
    } else {
      alert("go");
    }
  },
};
var addingfuel = prompt("Please enter fuel added", "liter");
toyota.tank(addingfuel); // you need to pass this otherwise it is undefined
toyota.start();

2)我的数据框有很多空单元格。除非每个单元格都有一个值,否则您无法执行回归。因此,我提供了一些代码,以用NaN替换所有infs和空单元格,然后用0值填充所有NaN。代码段:

df_raw = df_raw.drop([0])

3)我编写多元回归公式的方式是错误的。我将其纠正如下:

df_raw ['daily pct return']= df_raw ['daily pct return'].replace([np.inf, -np.inf],np.nan)
df_raw = df_raw.replace(r'\s+', np.nan, regex=True).replace('', np.nan)
df_raw.fillna(value=0, axis=1,inplace=True)

总而言之,我的更新代码如下:

result = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6', data=df_raw).fit()