Question

我有以下代码来解析网站：

if os.path.isfile(data_content_file):
    try:
        with open(data_content_file) as data_file:    
            question_answer = json.load(data_file)
    except Exception as e:
        question_answer = {}
else:
    question_answer = {}

if os.path.isfile(count_file):
    f = open(count_file, 'r')
    try:
        start = int(f.read())
    except Exception as e:
        start = 1
    f.close()
else:
    start = 1

f = open(count_file, 'w+')
for x in xrange(start,500000):
    try:
        print(x)
        f.seek(0)
        f.truncate()
        f.write(str(x))
        req = urllib2.Request("https://islamqa.info/en/"+str(x), headers={'User-Agent' : "Magic Browser"}) 
        con = urllib2.urlopen( req )
        soup = BeautifulSoup(con.read(),"lxml")

我不知道为什么它会被某些x值冻结。

如果我停止我的脚本并再次运行相同的x值，它运行正常。

我尝试使用超时，但它没有加载任何页面，即使超时是1000：

req = urllib2.Request("https://islamqa.info/en/"+str(x), headers={'User-Agent' : "Magic Browser"},timeout=10000)

避免这种情况或继续循环，甚至网站冻结的最佳方法是什么。

urllib2.Request有时冻结

0 个答案: