Question

我想知道我是否可以将pandas.ols模型同时应用于多个响应变量的数据框，而不是一个独立变量。

想象一下我有以下几点：

In [109]: y=pandas.DataFrame(np.random.randn(10,4))
In [110]: x=pandas.DataFrame(np.random.randn(10,1))

我想做这样的事情：

In [111]: model=pandas.ols(y=y, x=x)

基本上是四个模型输出的结果或至少访问四个系数的结果。如果可能的话，我宁愿避免循环遍历响应变量。

Answer 1

我认为应该这样做。

#First generate the data
x=pd.DataFrame(np.random.randn(10,1))
y=pd.DataFrame(np.random.randn(10,4))

#Since we are doing things manually we'll need to add the constant term to the x matrix
x[1] = ones(10)

#This matrix precomputes (X'X)^-1X which we will premultiply the y matrix by to get results
tmpmat =  np.dot(np.linalg.pinv(np.dot(x.T ,x)),x.T)

#Solve for the betas
betamatrix = np.dot(tmpmat,y)

#Compare with the pandas output one at a time.
model=pd.ols(y=y[0], x=x, intercept=False)
model=pd.ols(y=y[1], x=x, intercept=False)

Answer 2

已多次这样做，并没有找到循环的替代方案。以下代码会将四个回归的结果存储在一个字典中。如果您只对某些系数感兴趣，可以在循环回归时捕获它们。

model = {}
for i in y:
    model[i] = pd.ols(y=y[i], x=x)

一次在多个因变量上使用pandas.ols

2 个答案: