以下代码模拟生产者和消费者模型,该模型将从外汇经纪商FXCM收集数据并写入数据库。 每个生产者进程都将与代理建立基于会话的连接。
当运行下面的代码时,您将看到Queue backlog,尽管在运行实际代码时这似乎更糟糕。
不确定它是否相关,但会话是使用python-forexconnect API创建的,该API是用C ++编写的并使用Boost。
from multiprocessing import Process, Queue, cpu_count
from datetime import datetime, timedelta
import numpy as np
import time
def dummy_data(dtto):
dates = np.array(
[dtto - timedelta(days=i) for i in range(300)])
price_data = np.random.rand(len(dates),5)
return np.concatenate(
(np.vstack(dates),price_data), axis=1)
def get_bars(q2, ms, symbol, dtfm, dtto, time_frame):
stop_date = dtfm
while dtto > stop_date:
data = dummy_data(dtto)
dtfm = data[-1,0]
dtto = data[0,0]
q2.put((symbol, dtfm, dtto))
# Switch to date
dtto = dtfm
def producer(q1,q2):
# client = fx.Client(....)
client = 'broker session'
while True:
job = q1.get()
if job == None:
sym, dtfm, dtto, tf = job
# Get price data from broker
get_bars(q2, client, sym, dtfm, dtto, tf)
def consumer(q2):
while True:
bars = q2.get()
if bars == None:
print(q2.qsize(), bars[0], bars[1], bars[2]) # write to db
q1, q2 = Queue(), Queue()
# instruments = client.get_offers()
# instruments = ['GBP/USD', 'EUR,USD',...]
instruments = range(63) # 62 dummy instruments
# Places jobs into the queue for each symbol
for symbol in instruments:
# Setup producers and consumers
pp, cp = range(6), range(2)
pro = [Process(target=producer, args=(q1, q2,)) for i in pp]
con = [Process(target=consumer, args=(q2,)) for i in cp]
for p in pro: p.start()
for p in con: p.start()
# This is just here to stop this script and does not
# exist in the real version
for i in pp: q1.put(None)
for p in pro: p.join()
for p in con: p.join()
答案 0 :(得分:1)
Horrible performance of multiprocessing.Queue.get()
is a known problem (several questions on Stackoverflow as well, but no answers that would be generally useful).
Which sort of indicates that you should consider another model. You could see how much process creation overhead is compared to this; do not use permanently running processes at all, but launch a process as soon as you have data ready for it. When you do it like this, your subprocess will receive an in-memory copy of data when your process forks. This adds process creation overhead but removes the queue. You could at least consider this as your consumer writes to database and does not need to report anything back to the parent.
Python is a great language but it is not the best performing when it comes to parallel processing.