我有一个包含链接的列表,我想使用5个线程来抓取所有链接。我设法为每个链接做一个线程,但我将有超过1000个链接,我不希望我的电脑和网站中断。
我的问题是如何使用固定数量的线程来抓取100个链接?
这就是我现在所得到的:
def main():
urls = ["http://google.com","http://yahoo.com"]
threads = []
#Starting all the requests
for url in urls:
thread = threading.Thread(target=loadurl, args=(url,))
thread.start()
print "[+] Thread started for:", url
threads.append(thread)
#All requests started
print "[+] Requests done"
for thread in threads:
thread.join()
print "[+] Finished!"
#Just print the source
def loadurl(url):
page = urllib2.urlopen(url);
soup = BeautifulSoup(page);
print soup