Question

有人可以向我解释statsmodel.formula.api中的ols与statsmodel.api中的ols之间的区别吗？

使用来自ISLR文本的广告数据，我使用两者运行了ols，并得到了不同的结果。然后我与scikit-learn的LinearRegression进行了比较。

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

df = pd.read_csv("C:\...\Advertising.csv")

x1 = df.loc[:,['TV']]
y1 = df.loc[:,['Sales']]

print "Statsmodel.Formula.Api Method"
model1 = smf.ols(formula='Sales ~ TV', data=df).fit()
print model1.params

print "\nStatsmodel.Api Method"
model2 = sm.OLS(y1, x1)
results = model2.fit()
print results.params

print "\nSci-Kit Learn Method"
model3 = LinearRegression()
model3.fit(x1, y1)
print model3.coef_
print model3.intercept_

输出如下：

Statsmodel.Formula.Api Method
Intercept    7.032594
TV           0.047537
dtype: float64

Statsmodel.Api Method
TV    0.08325
dtype: float64

Sci-Kit Learn Method
[[ 0.04753664]]
[ 7.03259355]

statsmodel.api方法从statsmodel.formula.api和scikit-learn方法返回TV的不同参数。

什么样的ols算法运行statsmodel.api会产生不同的结果？有没有人有文档链接可以帮助回答这个问题？

Answer 1

差异是由于是否存在拦截：

statsmodels.formula.api

，与R方法类似，常量会自动添加到您的数据中并且已安装截距
在statsmodels.api中，您必须自己添加一个常量（请参阅the documentation here）。尝试使用statsmodels.api中的add_constant
```
x1 = sm.add_constant(x1)
```

Answer 2

今天遇到了这个问题，并希望详细说明@ stellasia的答案，因为statsmodels文档可能有点含糊不清。

除非您在实例化OLS时使用actual R-style string-formulas ，否则您需要在statsmodels.formulas.api和普通{statsmodels.api下添加一个常量（字面上为1列） {1}}。 @Chetan在这里使用R风格的格式（formula='Sales ~ TV'），所以他不会遇到这种微妙的问题，但对于那些拥有一些Python知识但没有R背景的人来说，这可能会非常混乱。

此外，无关紧要在构建模型时是否指定hasconst参数。（这有点傻。）换句话说，除非你使用R风格的字符串公式，否则hasconst被忽略，即使它应该

[表示] RHS是否包含用户提供的常量

因为，在脚注中

除非使用公式，否则模型不会添加常量。

下面的示例显示，如果不使用R风格的字符串公式，.formulas.api和.api都需要用户添加的1s列向量。

# Generate some relational data
np.random.seed(123)
nobs = 25 
x = np.random.random((nobs, 2)) 
x_with_ones = sm.add_constant(x, prepend=False)
beta = [.1, .5, 1] 
e = np.random.random(nobs)
y = np.dot(x_with_ones, beta) + e

现在将x和y投入Excel并运行数据＆gt;数据分析＆gt;回归，确保＆＃34;常数为零＆＃34;未经检查。您将获得以下系数：

Intercept       1.497761024
X Variable 1    0.012073045
X Variable 2    0.623936056

现在，尝试在x或x_with_ones statsmodels.formula.api上statsmodels.api设置为hasconst，在None而不是True上运行此回归， False或import statsmodels.formula.api as smf import statsmodels.api as sm print('smf models') print('-' * 10) for hc in [None, True, False]: model = smf.OLS(endog=y, exog=x, hasconst=hc).fit() print(model.params) # smf models # ---------- # [ 1.46852293 1.8558273 ] # [ 1.46852293 1.8558273 ] # [ 1.46852293 1.8558273 ]。您将看到在这6个场景中的每个场景中都没有返回拦截。（只有2个参数。）

1.0

现在正确运行，x的列向量已添加到smf。您可以在此处使用print('sm models') print('-' * 10) for hc in [None, True, False]: model = sm.OLS(endog=y, exog=x_with_ones, hasconst=hc).fit() print(model.params) # sm models # ---------- # [ 0.01207304 0.62393606 1.49776102] # [ 0.01207304 0.62393606 1.49776102] # [ 0.01207304 0.62393606 1.49776102]，但如果您不使用公式则无需使用。

<?php 
require_once "init.php";
require_once "includes/db.php";

if(isset($_GET['code'])) // get code after authorization
{
    $url = 'https://www.linkedin.com/uas/oauth2/accessToken'; 
    $param = 'grant_type=authorization_code&code='.$_GET['code'].'&redirect_uri='.$config['callback_url'].'&client_id='.$config['Client_ID'].'&client_secret='.$config['Client_Secret'];
    $return = (json_decode(post_curl($url,$param),true)); // Request for access token
    if($return['error']) // if invalid output error
    {
       $content = 'Some error occured<br><br>'.$return['error_description'].'<br><br>Please Try again.';
    }
    else // token received successfully
    {
       $url = 'https://api.linkedin.com/v1/people/~:(id,firstName,lastName,pictureUrls::(original),headline,publicProfileUrl,location,industry,positions,email-address)?format=json&oauth2_access_token='.$return['access_token'];
       $User = json_decode(post_curl($url)); // Request user information on received token
       
      // Insert Data in Database
       $query = "INSERT INTO `linkedti_scheduler`.`users` 
       (`userid`, 
       `firstName`, 
       `lastName`, 
       `emailAddress`, 
       `position`, 
       `location`, 
       `profileURL`, 
       `pictureUrls`, 
       `headline`)
 
       VALUES
 
       ('$id', 
       '$firstName', 
       '$lastName', 
       '$emailAddress', 
       '$position', 
       '$location', 
       '$profileURL', 
       '$pictureUrls', 
       '$headline')";
       mysqli_query($connection,$query);
    }
}
 
?>
 
 
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>****</title>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" integrity="sha384-rwoIResjU2yc3z8GV/NPeZWAv56rSmLldC3R/AZzGRnGxQQKnKkoFVhFQhNUwEyJ" crossorigin="anonymous">
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/js/bootstrap.min.js" integrity="sha384-vBWWzlZJ8ea9aCX4pEW3rVHjgjt7zpkNpZk+02D9phzyeVkE+jo0ieGizqPLForn" crossorigin="anonymous"></script>
    <link rel="stylesheet" href="css/style.css" />
</head>
<body class="bg">
    
<div align="center" class="logo">
    <img src="images/logo.png" alt="logo" align="middle"/>
</div>


<div class="container">
        <div class="row">
            <div class="col-md-6 offset-md-3" id="button">
                <a href="<?php echo "https://www.linkedin.com/oauth/v2/authorization?response_type=code&client_id=". CLIENTID . "&redirect_uri=".REDIRECTURI."&state=". CSRF."&scope=".SCOPES ; ?>" class="btn btn-md btn-outline-primary btn-block" style="border-radius: 0px;">Login with LinkedIn</a>
            </div>
        </div>
    </div>
</body>
</html>

Answer 3

我在Logit功能上有类似的问题。（我使用了patsy来创建矩阵，因此拦截就在那里了。）我的sm.logit没有收敛。我的sm.formula.logit正在收敛。

输入的数据完全相同。我将求解器方法更改为“牛顿”，并且sm.logit也收敛了。这两个版本是否可能具有不同的默认求解器方法。

OLS使用statsmodel.formula.api与statsmodel.api

3 个答案: