我正试图弄清楚Python中的这个同步问题。我有一个生产者线程和(可选)多个消费者线程(取决于命令,即./script sums.txt -c 10)。 现在有1个生产者和1个消费者没有问题,因为同步是用Queue处理的。
现在问题是,有超过1个消费者线程,线程1可能从队列中获取一个项目并对其进行处理。虽然线程2执行相同但比线程1更快,并在线程1之前打印。我试图用随机定时器模拟这个问题。
我的输出现在使用随机定时器:“。/ script sommen.txt -c 2” 正如您所注意到的那样,队列中的第二项是在第一项之前处理的,如果没有随机定时器,则不会发生很多操作,因为操作非常简单,因此线程足够快。有没有办法解决这个问题?我想过锁,但这会使程序效率低下吗?
另一件事,清理线程的最佳方法是什么。我知道我的队列何时完成(哨兵值),但是有什么方法可以清理线程?
非常感谢!
Consumers is set to: 2
I'm thread number: 4316991488 Read (P): 12 + 90
I'm thread number: 4316991488 Read (P): 420 / 20
I'm thread number: 4316991488 Read (P): 12 + 90
I'm thread number: 4316991488 Read (P): 420 / 20
Monitor is done
I'm thread number: 4329586688 Write (C): 420 / 20 = 21.0
I'm thread number: 4324331520 Write (C): 12 + 90 = 102
-
#!/usr/bin/env python
import threading
import operator
import sys
import queue
import optparse
from time import sleep
import random
def optionsparser():
parser = optparse.OptionParser(
usage="usage: %prog file [Options]")
parser.add_option("-c", "--consumer", dest="consumer", type="int",
help="consumer <ident> [default: %default]")
parser.set_defaults(consumer=1)
opts, files = parser.parse_args()
filename = files[0]
try:
_f = open(filename)
return(filename, opts.consumer)
except IOError:
print ('Oh dear I/O Error')
def readitems(filename):
print("Read from file: ", filename)
with open(filename, 'r') as f:
mylist = [line.rstrip('\n') for line in f]
f.close()
try:
for _line in mylist:
data = _line.split(' ')
qprint.put(data) #write to monitor queue
qsum.put(data) #write to consumer queue
except ValueError as e:
print(e)
except RuntimeError as err:
print(err)
finally:
qsum.put("Done Flag")
qprint.put("Done Flag")
def consumer(qsum):
while qsum:
sleeptime = random.randint(1,10)
sleep(sleeptime)
try:
if qsum.get() == "Done Flag":
print("Monitor queue empty", threading.get_ident())
## Clean up
# Put bakc for other consumers
qsum.put("Done Flag")
#cleanup here
else:
data = qsum.get()
operator = calc(data)
except EnvironmentError as Err:
print(Err)
def calc(data):
try:
sleeptime = random.randint(1,10)
sleep(sleeptime)
getal1, diff, getal2 = data
getal1 = int(getal1)
getal2 = int(getal2)
if diff == '+':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.add(getal1, getal2))
elif diff == '-':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.sub(getal1, getal2))
elif diff == '*':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.mul(getal1, getal2))
elif diff == '/':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.truediv(getal1, getal2))
elif diff == '%':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.mod(getal1, getal2))
elif diff == '**':
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", operator.pow(getal1, getal2))
else:
print("I'm thread number:", threading.get_ident(), "Write (C):", str(getal1), diff, str(getal2), "=", "Unknown operator!")
except ZeroDivisionError as Err:
print(Err)
except ValueError:
print("Wrong input")
def producer(reqs):
try:
readitems(reqs)
except IndexError as e:
print(e)
def monitor(qprint):
while qprint:
try:
if qprint.get() == "Done Flag":
print("Monitor is done")
else:
data = (qprint.get())
getal1, diff, getal2 = data
print("I'm thread number:", threading.get_ident(), "Read (P):", str(getal1), diff, str(getal2))
except RuntimeError as e:
print(e)
if __name__ == '__main__':
try:
reqs = optionsparser()
#create queu's
qprint = queue.Queue()
qsum = queue.Queue()
#monitor threads
t2 = threading.Thread(target=monitor, args=(qprint,))
t2.start()
#create consumers threads
thread_count = reqs[1]
print("Consumers is set to:", thread_count)
for i in range(thread_count):
t = threading.Thread(target=consumer, args=(qsum,))
t.start()
#start producer
producer(reqs[0])
except RuntimeError as Err:
print(Err)
except AssertionError as e:
print(e)
答案 0 :(得分:0)
当任务可以被拆分并独立威胁时,使用线程是有效的。如果您想使用thead,请记住,当代码中没有锁定点或锁定点很少时,并行化代码会更有效。锁定点可以是共享资源。
在您的情况下,您只需生成/使用数据,并希望它同步。如果以顺序方式运行此代码会更有效,否则您必须更准确地定义哪些任务可以从并行化中受益。
答案 1 :(得分:0)
首先:不要使用Python线程来加速CPU绑定任务,比如计算。除了减速之外,你永远不会看到任何东西。 Because GIL。将Python线程用于I / O绑定任务,例如URL提取。
如果您希望结果按发布顺序到达,请为每个队列元素指定一个序列号。这样每个任务都会知道其结果所在的位置。
使用有序集合(例如列表)将序列号作为索引放置工作线程生成的结果。由于您可能会以相反的顺序接收结果,因此您需要将它们全部存储起来(不能流式传输)。
我不明白为什么在这里使用锁定。首先,锁通过阻止其他独立工作者来破坏并行处理的目的。其次,锁很难,容易出现细微的错误。队列更友好。