Python多线程应用程序不会终止

时间:2015-11-30 11:55:29

标签: python multithreading

这是我的代码,它基本上只列出了94,000多个网址,并为它们收集了http_status代码:

#!/usr/bin/python3
    import threading
    from queue import Queue
    import urllib.request
    import urllib.parse
    from http.client import HTTPConnection
    import socket
    import http.client
    #import httplib

    url_input = open("urls_prod_sort.txt", "r").read()
    urls = url_input[:url_input.rfind('\n')].split('\n')

    #urls = urls[:100]
    url_502 = []
    url_logs = []

    url_502_lock = threading.Lock()
    print_lock = threading.Lock()



    def sendRequest(url_u, http_method = 'GET', data = None):


                    use_proxy = "http://xxxxxxxx:8080"
                    proxies = {"http": use_proxy}
                    proxy = urllib.request.ProxyHandler(proxies)
                    handler = urllib.request.HTTPHandler()
                    url = "http://" + url_u
                    with print_lock:
                                    print(url)
                    opener = urllib.request.build_opener(proxy,handler)
                    urllib.request.install_opener(opener)
                    request = urllib.request.Request(url,data)
                    request.add_header("User-agent","| MSIE |")
                    request.get_method = lambda: http_method

                    try:
                                    response = urllib.request.urlopen(request)
                                    response_code = response.code

                    except urllib.error.HTTPError as error:
                                    response_code = error.code

                    except urllib.error.URLError as e2:
                                    response_code = 701

                    except socket.timeout as e3:
                                    response_code = 702

                    except socket.error as e4:
                                    response_code = 703

                    except http.client.IncompleteRead as e:
                                    response_code = 700


                    if response_code == 502:
                                    with url_502_lock:
                                                    #url_502.append(url)
                                                    url_502_file = open("url_502_file.txt", "a")
                                                    url_502_file.write(url + "\n")                                     
                                                    url_502_file.close()

                    with print_lock:
                                    #url_logs.append(url + "," + str(response_code))
                                    url_all_logs_file = open("url_all_logs.csv", "a")
                                    url_all_logs_file.write(url + "," + str(response_code) + '\n')
                                    url_all_logs_file.close()

                                    #print (url + "," + str(response_code))  
                                    #print (response_code)

                    return response_code


    def worker():
                    while True:
                            url = q.get()
                            if url == ":::::"
                                    break
                            else:
                                    sendRequest(url)
                                    q.task_done()


    #======================================


    q = Queue()

    for threads in range(1000):
                    t = threading.Thread(target = worker)
                    t.daemon = True
                    t.start()


    for url in urls:
                    q.put(url)
    q.put(":::::") 
    q.join()

然而,该程序似乎永远不会终止(即使URL已全部通过),这迫使我ctrl-c程序 - 然后我得到以下错误:

Traceback (most recent call last):
File "./url_sc_checker.py", line 120, in <module>
q.join()
File "/usr/lib/python3.2/queue.py", line 82, in join
self.all_tasks_done.wait()
File "/usr/lib/python3.2/threading.py", line 235, in wait
waiter.acquire()
KeyboardInterrupt

1 个答案:

答案 0 :(得分:1)

你的程序没有终止的原因很简单,你的工作者创建了一个无限循环:

def worker():
    while True:
        ...

您需要在while语句中抛出异常,中断或终止条件。否则你的程序将继续尝试从队列中获取下一个作业,而不知道永远不会有下一个作业。

执行此操作的常用方法是在队列中放置一个sentinel值,当从队列中检出作业时,worker会检查它是否为sentinel值并断开循环。

另一种方法是拥有一个在while条件下检查的全局条件变量。当作业生产者将所有项目推送到队列时,作业生成者加入队列,当所有作业完成后,作业生产者解除阻塞并终止我们的进程的线程。

您的进程未终止的另一个可能原因是,如果您的sendRequest产生意外的异常,那么该线程将终止并且您将留下一些从未标记为已完成的作业。