我试图创建一个管道数据的玩具示例到使用multiprocessing
库的python脚本。此外,这是将在Web应用程序上运行的代码,因此请在回复中考虑这一点。
当json stream
变成一定长度时,它不能再作为我的python脚本的参数输入
我想从'__main__'
函数返回数据,因为multiprocessing
needs to be run from the if name == '__main__':
run.py
包含以下代码:
import numpy
import pandas
import subprocess
#create a json stream to pipe
def create_stream(n_rows, n_cols):
df = pandas.DataFrame(numpy.random.randn(n_rows, n_cols))
return df.to_json()
#pipe the json stream into the script multi_fun.py script
def test_multi(n_rows, n_cols):
stream = create_stream(n_rows, n_cols)
subprocess.call(["python", "multi_fun.py", stream])
multi_fun.py
脚本具有以下代码:
import sys
import argparse
import pandas
from multiprocessing import Process
def get_maxes(df):
d = {}
for row in df.index:
row_max = df.loc[row, :].max()
print row_max
d[row] = row_max
return pandas.Series(d)
if __name__ == '__main__':
usage = sys.argv[0] + "'json' stream"
description = "toy multiprocessing"
parser = argparse.ArgumentParser(description = description,
usage = usage
)
parser.add_argument('stream',
nargs = 1,
type = str,
help = 'json stream'
)
args = parser.parse_args()
df = pandas.read_json(args.stream[0] , typ = 'frame')
process = Process(target = get_maxes, args = (df, ) )
process.start()
process.join()
从iPython开始,当我使用DataFrame
In [105]: import run
In [106]: run.test_multi(5,5)
1.2327553624
...
0.546843752
In [107]: run.test_multi(10, 10)
1.9811924526
1.059567146
....
1.6137051224
1.2899954045
但是在更高的数字上,我得到了这个:
In [108]: run.test_multi(100, 100)
OSError: [Errno 7] Argument list too long
我尝试过os.popen(stream)
,例如:
In [109:] subprocess.call(["python", "multi_fun.py", os.popen(stream)])
同样将我的argparse
参数更改为type = argparse.FileType('r')
,但我得到了:
TypeError: execv() arg 2 must contain only strings