Question

我有list由ID组成，每天约50,000。

我必须每天向服务器发出50k请求{服务器在同一个城市}，并获取信息并将其存储到数据库中...我已经使用loop和{{ {1}} 而且我注意到在不知道的时间之后它就停止了存储......

看看我的代码片段

Threads

import re,urllib,urllib2 import mysql.connector as sql import threading from time import sleep import idvalid conn = sql.connect(user="example",password="example",host="127.0.0.1",database="students",collation="cp1256_general_ci") cmds = conn.cursor() ids=[] #here is going to be stored the ID's def fetch(): while len(ids)>0:#it will loop until the list of ID's is finish try: idnumber = ids.pop() content = urllib2.urlopen("http://www.example.com/fetch.php?id="+idnumber,timeout=120).read() if content.find('<font color="red">') != -1: pass else: name=content[-20:] cmds.execute("INSERT INTO `students`.`basic` (`id` ,`name`)VALUES ('%s', '%s');"%(idnumber,name)) except Exception,r: print r,"==>",idnumber sleep(0.5)#i think sleep will help in threading ? i'm not sure pass print len(ids)#print how many ID's left for i in range(0,50):#i've set 50 threads threading.Thread(target=fetch).start()：它将继续打印剩余的ID数量，并在未知的时刻停止打印和提取＆amp;存储

Answer 1

网络和线程都是非平凡的...最可能的原因是导致挂起线程的网络事件。我有兴趣听听人们是否有解决方案，因为我遇到了同样停止响应的线程问题。

但是我的代码肯定会改变一些事情：

我永远不会抓到“例外”。抓住那些你知道如何处理的异常。如果您的某个线程发生网络错误，您可以重试而不是放弃ID。
您的代码中存在竞争条件：您首先检查是否有剩余内容，然后将其取出。在第二个时间点，剩下的工作可能已经消失，导致例外。如果你发现这很难解决，有一个很棒的python对象，它意味着在没有竞争条件和死锁的线程之间传递对象：Queue对象。看看吧。
“睡眠（0.5）”一般不帮助穿线。它没有必要。它可能会降低击中比赛条件的机会，但最好完全打出比赛条件。另一方面，拥有50个完全混乱的线程来敲击Web服务器可能不是一件非常友好的事情。确保保持在服务范围内。

为什么我的线程会停止？

1 个答案: