我有一个Python脚本,它调用几个Stata do文件:

from subprocess import call
Stata_exec = "D:/Stata 12 MP2/StataMP-64.exe"
dofile = "D:/Test.do" 
call( "\"{0}\" do /e \"{1}\"".format(Stata_exec, dofile), shell=True)


/* Merge some big files */

clear *

// Create dataset A (8000 variables, 300 observations)
set obs 300
gen ID = _n
forval i = 1/8000 {
    gen variableA`i' = runiform()
tempfile dataA
save "`dataA'"

// Create dataset B (5000 variables, 300 observations)
set obs 300
gen ID = _n
forval i = 1/5000 {
    gen variableB`i' = runiform()

sort ID

// Attempt merge
merge 1:1 ID using `dataA'
exit, clear



您可以使用log命令将Stata会话的副本回送到文件,或者使用file命令将特定消息(如“Data A Created”)写入文本文件。 Python应该能够使用subprocess.call(["tail", "-F", logfilename])


自从发布此问题以来,许多工具已经出现并日趋成熟。 现在可以通过安装和安装Stata和Python输出集成 使用Jupyter kernel for Stata

pip install stata_kernel
python -m stata_kernel.install

这允许在Windows,Mac和Linux上运行的Jupyter Notebook中即时进行内核切换。合并 然后可以以通常的方式轻松保存并导出结果。

请注意,每次切换内核时,它都会关闭并且什么也没有 保存在内存中。因此,要充分利用这种方法 目前最好是运行自包含 Stata和Python 脚本。这些可以是任何顺序。

这当然只是一种方法,因为Jupyter Notebook在交互式代码和脚本的执行方式上提供了很大的灵活性。


In [1]: %load testpython1.py

In [2]: # %load testpython1.py
   ...: import pandas as pd
   ...: df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
   ...: print()
   ...: print(df[['mpg', 'weight', 'price']].head())
   ...: print()
   ...: print(df[['mpg','weight','price']].describe())

   mpg  weight  price
0   22    2930   4099
1   17    3350   4749
2   22    2640   3799
3   20    3250   4816
4   15    4080   7827

             mpg       weight         price
count  74.000000    74.000000     74.000000
mean   21.297297  3019.459459   6165.256757
std     5.785503   777.193567   2949.495885
min    12.000000  1760.000000   3291.000000
25%    18.000000  2250.000000   4220.250000
50%    20.000000  3190.000000   5006.500000
75%    24.750000  3600.000000   6332.250000
max    41.000000  4840.000000  15906.000000

In [1]: do teststata1.do

. sysuse auto
(1978 Automobile Data)

. list mpg weight price in 1/5

     | mpg   weight   price |
  1. |  22    2,930   4,099 |
  2. |  17    3,350   4,749 |
  3. |  22    2,640   3,799 |
  4. |  20    3,250   4,816 |
  5. |  15    4,080   7,827 |

. summarize mpg weight price

    Variable |        Obs        Mean    Std. Dev.       Min        Max
         mpg |         74     21.2973    5.785503         12         41
      weight |         74    3019.459    777.1936       1760       4840
       price |         74    6165.257    2949.496       3291      15906

end of do-file

In [1]: %load testpython2.py

In [2]: # %load testpython2.py
   ...: import pandas as pd   
   ...: import statsmodels.api as sm
   ...: df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
   ...: Y = df['mpg']
   ...: df['cons'] = 1
   ...: X = df[['weight', 'price', 'cons']]
   ...: reg = sm.OLS(Y, X).fit()
   ...: print(reg.summary())

                            OLS Regression Results                            
Dep. Variable:                    mpg   R-squared:                       0.653
Model:                            OLS   Adj. R-squared:                  0.643
Method:                 Least Squares   F-statistic:                     66.85
Date:                Mon, 01 Oct 2018   Prob (F-statistic):           4.73e-17
Time:                        15:36:01   Log-Likelihood:                -195.22
No. Observations:                  74   AIC:                             396.4
Df Residuals:                      71   BIC:                             403.3
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
weight        -0.0058      0.001     -9.421      0.000      -0.007      -0.005
price      -9.351e-05      0.000     -0.575      0.567      -0.000       0.000
cons          39.4397      1.622     24.322      0.000      36.206      42.673
Omnibus:                       29.900   Durbin-Watson:                   2.347
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               60.190
Skew:                           1.422   Prob(JB):                     8.51e-14
Kurtosis:                       6.382   Cond. No.                     3.00e+04

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large,  3e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [1]: do teststata2.do

. sysuse auto
(1978 Automobile Data)

. regress mpg weight price

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     66.85
       Model |  1595.93249         2  797.966246   Prob > F        =    0.0000
    Residual |  847.526967        71  11.9369995   R-squared       =    0.6531
-------------+----------------------------------   Adj R-squared   =    0.6434
       Total |  2443.45946        73  33.4720474   Root MSE        =     3.455

         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      weight |  -.0058175   .0006175    -9.42   0.000    -.0070489   -.0045862
       price |  -.0000935   .0001627    -0.57   0.567     -.000418    .0002309
       _cons |   39.43966   1.621563    24.32   0.000     36.20635    42.67296

end of do-file

的确,这不是很多年前问到的“纯” Python控制台管道。尽管如此,这可能是一个很好的解决方法,即使没有更好的结果,其效果也一样。