我试图通过Queue使解析器成为多线程。它似乎工作,但我的队列挂了。如果有人能告诉我如何解决这个问题,我会很感激,因为我很少编写多线程代码。
此代码从Q:
中读取from silk import *
import json
import datetime
import pandas
import Queue
from threading import Thread
l = []
q = Queue.Queue()
def parse_record():
d = {}
while not q.empty():
rec = q.get()
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
l.append(d) # l is global
这填补了Q:
def parse_records():
ffile = '/tmp/query.rwf'
flows = SilkFile(ffile, READ)
numthreads = 2
# fill queue
for rec in flows:
q.put(rec)
# work on Queue
for i in range(numthreads):
t = Thread(target = parse_record)
t.daemon = True
t.start()
# blocking
q.join()
# never reached
data_df = pandas.DataFrame.from_records(l)
return data_df
我只在我的主电话中拨打parse_records()
。它永远不会终止。
答案 0 :(得分:2)
...如果empty()返回False,则不保证对get()的后续调用不会阻塞。
至少应使用get_nowait
或冒数据丢失的风险。但更重要的是,只有当所有排队的项目都标记为Queue.task_done次呼叫时才会发布加入:
如果join()当前正在阻止,它将在所有项目都已处理后恢复(这意味着已为每个已放入队列的项目收到task_done()调用)。
作为旁注,l.append(d)
不是原子的,应该用锁保护。
from silk import *
import json
import datetime
import pandas
import Queue
from threading import Thread, Lock
l = []
l_lock = Lock()
q = Queue.Queue()
def parse_record():
d = {}
while 1:
try:
rec = q.getnowait()
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
with l_lock():
l.append(d) # l is global
q.task_done()
except Queue.Empty:
return
通过使用标准库中的线程池,您可以大大缩短代码。
from silk import *
import json
import datetime
import pandas
import multiprocessing.pool
def parse_record(rec):
d = {}
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
return d
def parse_records():
ffile = '/tmp/query.rwf'
flows = SilkFile(ffile, READ)
pool = multiprocessing.pool.Pool(2)
data_df = pandas.DataFrame.from_records(pool.map(parse_record), flows)
pool.close()
return data_df