我有一个Python脚本,它调用几个Stata do
文件:
from subprocess import call
Stata_exec = "D:/Stata 12 MP2/StataMP-64.exe"
dofile = "D:/Test.do"
call( "\"{0}\" do /e \"{1}\"".format(Stata_exec, dofile), shell=True)
这是一个测试do
文件:
/* Merge some big files */
clear *
// Create dataset A (8000 variables, 300 observations)
set obs 300
gen ID = _n
forval i = 1/8000 {
gen variableA`i' = runiform()
}
tempfile dataA
save "`dataA'"
// Create dataset B (5000 variables, 300 observations)
clear
set obs 300
gen ID = _n
forval i = 1/5000 {
gen variableB`i' = runiform()
}
sort ID
// Attempt merge
merge 1:1 ID using `dataA'
exit, clear
我希望将do
文件的进度实时传送到控制台,因此它将与其他Python输出集成。
这可能吗?
答案 0 :(得分:1)
您可以使用log
命令将Stata会话的副本回送到文件,或者使用file
命令将特定消息(如“Data A Created”)写入文本文件。 Python应该能够使用subprocess.call(["tail", "-F", logfilename])
答案 1 :(得分:1)
自从发布此问题以来,许多工具已经出现并日趋成熟。 现在可以通过安装和安装Stata和Python输出集成 使用Jupyter kernel for Stata:
pip install stata_kernel
python -m stata_kernel.install
这允许在Windows,Mac和Linux上运行的Jupyter Notebook中即时进行内核切换。合并 然后可以以通常的方式轻松保存并导出结果。
请注意,每次切换内核时,它都会关闭并且什么也没有 保存在内存中。因此,要充分利用这种方法 目前最好是运行自包含 Stata和Python 脚本。这些可以是任何顺序。
这当然只是一种方法,因为Jupyter Notebook在交互式代码和脚本的执行方式上提供了很大的灵活性。
这是一个使用Stata的auto
玩具数据集的非常简单的示例:
In [1]: %load testpython1.py
In [2]: # %load testpython1.py
...: import pandas as pd
...: df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
...: print()
...: print(df[['mpg', 'weight', 'price']].head())
...: print()
...: print(df[['mpg','weight','price']].describe())
...:
...:
Out[2]:
mpg weight price
0 22 2930 4099
1 17 3350 4749
2 22 2640 3799
3 20 3250 4816
4 15 4080 7827
mpg weight price
count 74.000000 74.000000 74.000000
mean 21.297297 3019.459459 6165.256757
std 5.785503 777.193567 2949.495885
min 12.000000 1760.000000 3291.000000
25% 18.000000 2250.000000 4220.250000
50% 20.000000 3190.000000 5006.500000
75% 24.750000 3600.000000 6332.250000
max 41.000000 4840.000000 15906.000000
In [1]: do teststata1.do
. sysuse auto
(1978 Automobile Data)
. list mpg weight price in 1/5
+----------------------+
| mpg weight price |
|----------------------|
1. | 22 2,930 4,099 |
2. | 17 3,350 4,749 |
3. | 22 2,640 3,799 |
4. | 20 3,250 4,816 |
5. | 15 4,080 7,827 |
+----------------------+
. summarize mpg weight price
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
price | 74 6165.257 2949.496 3291 15906
.
end of do-file
In [1]: %load testpython2.py
In [2]: # %load testpython2.py
...: import pandas as pd
...: import statsmodels.api as sm
...:
...: df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
...:
...: Y = df['mpg']
...: df['cons'] = 1
...: X = df[['weight', 'price', 'cons']]
...:
...: reg = sm.OLS(Y, X).fit()
...: print(reg.summary())
...:
...:
OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.653
Model: OLS Adj. R-squared: 0.643
Method: Least Squares F-statistic: 66.85
Date: Mon, 01 Oct 2018 Prob (F-statistic): 4.73e-17
Time: 15:36:01 Log-Likelihood: -195.22
No. Observations: 74 AIC: 396.4
Df Residuals: 71 BIC: 403.3
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
weight -0.0058 0.001 -9.421 0.000 -0.007 -0.005
price -9.351e-05 0.000 -0.575 0.567 -0.000 0.000
cons 39.4397 1.622 24.322 0.000 36.206 42.673
==============================================================================
Omnibus: 29.900 Durbin-Watson: 2.347
Prob(Omnibus): 0.000 Jarque-Bera (JB): 60.190
Skew: 1.422 Prob(JB): 8.51e-14
Kurtosis: 6.382 Cond. No. 3.00e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
In [1]: do teststata2.do
. sysuse auto
(1978 Automobile Data)
. regress mpg weight price
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 66.85
Model | 1595.93249 2 797.966246 Prob > F = 0.0000
Residual | 847.526967 71 11.9369995 R-squared = 0.6531
-------------+---------------------------------- Adj R-squared = 0.6434
Total | 2443.45946 73 33.4720474 Root MSE = 3.455
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0058175 .0006175 -9.42 0.000 -.0070489 -.0045862
price | -.0000935 .0001627 -0.57 0.567 -.000418 .0002309
_cons | 39.43966 1.621563 24.32 0.000 36.20635 42.67296
------------------------------------------------------------------------------
.
end of do-file
的确,这不是很多年前问到的“纯” Python控制台管道。尽管如此,这可能是一个很好的解决方法,即使没有更好的结果,其效果也一样。