使用Linux和Python 2.7.6,我有一个脚本可以一次上传大量文件。我正在使用队列和线程模块的多线程。
如果用户点击ctrl-C,我为SIGINT实现了一个处理程序来停止脚本。我更喜欢使用守护程序线程,所以我不必清除队列,这需要很多重写代码,以使SIGINT处理程序可以访问Queue对象,因为处理程序不接受参数。
为了确保守护程序线程在sys.exit()之前完成并清理,我使用threading.Event()和threading.clear()来使线程等待。这段代码似乎可以作为print threading.enumerate()只显示在我调试时脚本终止之前的主线程。为了确保这一点,我想知道是否对这个清理实现有任何见解我可能会丢失,即使它似乎对我有用:
def signal_handler(signal, frame):
global kill_received
kill_received = True
msg = (
"\n\nYou pressed Ctrl+C!"
"\nYour logs and their locations are:"
"\n{}\n{}\n{}\n\n".format(debug, error, info))
logger.info(msg)
threads = threading.Event()
threads.clear()
while True:
time.sleep(3)
threads_remaining = len(threading.enumerate())
print threads_remaining
if threads_remaining == 1:
sys.exit()
def do_the_uploads(file_list, file_quantity,
retry_list, authenticate):
"""The uploading engine"""
value = raw_input(
"\nPlease enter how many concurent "
"uploads you want at one time(example: 200)> ")
value = int(value)
logger.info('{} concurent uploads will be used.'.format(value))
confirm = raw_input(
"\nProceed to upload files? Enter [Y/y] for yes: ").upper()
if confirm == "Y":
kill_received = False
sys.stdout.write("\x1b[2J\x1b[H")
q = CustomQueue()
def worker():
global kill_received
while not kill_received:
item = q.get()
upload_file(item, file_quantity, retry_list, authenticate, q)
q.task_done()
for i in range(value):
t = Thread(target=worker)
t.setDaemon(True)
t.start()
for item in file_list:
q.put(item)
q.join()
print "Finished. Cleaning up processes...",
#Allowing the threads to cleanup
time.sleep(4)
def upload_file(file_obj, file_quantity, retry_list, authenticate, q):
"""Uploads a file. One file per it's own thread. No batch style. This way if one upload
fails no others are effected."""
absolute_path_filename, filename, dir_name, token, url = file_obj
url = url + dir_name + '/' + filename
try:
with open(absolute_path_filename) as f:
r = requests.put(url, data=f, headers=header_collection, timeout=20)
except requests.exceptions.ConnectionError as e:
pass
if src_md5 == r.headers['etag']:
file_quantity.deduct()
答案 0 :(得分:4)
如果你想处理Ctrl+C
;它足以在主线程中处理KeyboardInterrupt
异常。除非您在其中执行global X
,否则请勿在函数中使用X = some_value
。使用time.sleep(4)
来清除线程是一种代码味道。你不需要它。
我正在使用threading.Event()和threading.clear()来使线程等待。
此代码对您的主题无影响:
# create local variable
threads = threading.Event()
# clear internal flag in it (that is returned by .is_set/.wait methods)
threads.clear()
不要从多线程程序中的信号处理程序中调用logger.info()
。它可能使你的程序陷入僵局。只能从信号处理程序调用一组有限的函数。安全选项是在其中设置一个全局标志并退出:
def signal_handler(signal, frame):
global kill_received
kill_received = True
# return (no more code)
信号可能会延迟到q.join()
返回。即使信号立即传递; q.get()
阻止您的子线程。它们会挂起,直到主线程退出。要解决这两个问题,你可以使用一个标记来表示没有更多工作的子进程,在这种情况下完全放弃信号处理程序:
def worker(stopped, queue, *args):
for item in iter(queue.get, None): # iterate until queue.get() returns None
if not stopped.is_set(): # a simple global flag would also work here
upload_file(item, *args)
else:
break # exit prematurely
# do child specific clean up here
# start threads
q = Queue.Queue()
stopped = threading.Event() # set when threads should exit prematurely
threads = set()
for _ in range(number_of_threads):
t = Thread(target=worker, args=(stopped, q)+other_args)
threads.add(t)
t.daemon = True
t.start()
# provide work
for item in file_list:
q.put(item)
for _ in threads:
q.put(None) # put sentinel to signal the end
while threads: # until there are alive child threads
try:
for t in threads:
t.join(.3) # use a timeout to get KeyboardInterrupt sooner
if not t.is_alive():
threads.remove(t) # remove dead
break
except (KeyboardInterrupt, SystemExit):
print("got Ctrl+C (SIGINT) or exit() is called")
stopped.set() # signal threads to exit gracefully
我已将value
重命名为number_of_threads
。我使用了显式线程集
如果个人upload_file()
阻止,则该计划不会退出Ctrl-C
。
对于multiprocessing.Pool
界面,您的案例似乎很简单:
from multiprocessing.pool import ThreadPool
from functools import partial
def do_uploads(number_of_threads, file_list, **kwargs_for_upload_file):
process_file = partial(upload_file, **kwargs_for_upload_file)
pool = ThreadPool(number_of_threads) # number of concurrent uploads
try:
for _ in pool.imap_unordered(process_file, file_list):
pass # you could report progress here
finally:
pool.close() # no more additional work
pool.join() # wait until current work is done
它应该正常退出Ctrl-C
,即正在进行的上传可以完成,但新的上传不会开始。