我真的在这个问题上摸不着头脑。这个python beautifulsoup脚本之前在同一台计算机上运行良好,目前在另一台计算机上工作正常。但是当我尝试在假定的计算机中运行它时,循环在经过几次后就会卡住(循环应该经历90,000次以上)。我没有收到任何错误消息,互联网工作正常,否则,我不知道为什么?我已经排除了网站更改(没有网站更改),而我所谓的计算机功能强大,可以运行它,因为之前运行它完全没问题。
我的python设置并没有真正改变(除了我已经安装了virtualenv和flask,但我怀疑它与此相关),我有另一个脚本使用tweepy收集推文,这同样的假设计算机得到成千上万的推文相关信息,运行良好而不会卡住。
import urllib2
import re
import csv
from bs4 import BeautifulSoup
import time
def get_Search():
for loop1 in range(0, 36): #36
for loop2 in range(0, 36): #36
for loop3 in range(0, 36): #36
for loop4 in range(18, 20): # CHANGEABLE per script
for loop5 in range(19, 20): # CHANGEABLE per script
URLchar = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"] # len(URLchar) = 36
url1 = "https://websiteABC.org"
url2 = "-"
urlComplete = url1 + str(URLchar[loop1]) + str(URLchar[loop2]) + url2 + \
str(URLchar[loop3]) + str(URLchar[loop4]) + str(URLchar[loop5])
try:
page = urllib2.urlopen(urlComplete)
soup_FamSearchURL = BeautifulSoup(page, "lxml")
page.close()
censusSubhead = soup_FamSearchURL.find("title").get_text(strip=True)
censusSubhead_decode = censusSubhead.encode("ascii", "ignore")
censusString = "Matched Heading"
if censusSubhead_decode != None:
time.sleep(1)
if censusString in censusSubhead_decode:
print str(urlComplete) + " yes"
else:
print str(urlComplete) + " no"
except IOError: # adding the error handling part so that internet breakage won't crash the program
time.sleep(60)
continue
loop5 += 1
loop4 += 1
loop3 += 1
loop2 += 1
loop1 += 1
get_Search()