BeautifulSoup脚本在另一台计算机之前和之后工作但不再工作了吗?

时间:2014-11-06 05:00:50

标签: python-2.7 web-scraping beautifulsoup

我真的在这个问题上摸不着头脑。这个python beautifulsoup脚本之前在同一台计算机上运行良好,目前在另一台计算机上工作正常。但是当我尝试在假定的计算机中运行它时,循环在经过几次后就会卡住(循环应该经历90,000次以上)。我没有收到任何错误消息,互联网工作正常,否则,我不知道为什么?我已经排除了网站更改(没有网站更改),而我所谓的计算机功能强大,可以运行它,因为之前运行它完全没问题。

我的python设置并没有真正改变(除了我已经安装了virtualenv和flask,但我怀疑它与此相关),我有另一个脚本使用tweepy收集推文,这同样的假设计算机得到成千上万的推文相关信息,运行良好而不会卡住。

import urllib2
import re
import csv
from bs4 import BeautifulSoup
import time

def get_Search():

    for loop1 in range(0, 36): #36
        for loop2 in range(0, 36): #36
            for loop3 in range(0, 36): #36
                for loop4 in range(18, 20): # CHANGEABLE per script
                    for loop5 in range(19, 20): # CHANGEABLE per script

                        URLchar = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
                            "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
                            "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"] # len(URLchar) = 36

                        url1 = "https://websiteABC.org"
                        url2 = "-"
                        urlComplete  = url1 + str(URLchar[loop1]) + str(URLchar[loop2]) + url2 + \
                            str(URLchar[loop3]) + str(URLchar[loop4]) + str(URLchar[loop5])

                        try:
                            page = urllib2.urlopen(urlComplete)

                            soup_FamSearchURL = BeautifulSoup(page, "lxml")
                            page.close()

                            censusSubhead = soup_FamSearchURL.find("title").get_text(strip=True)
                            censusSubhead_decode = censusSubhead.encode("ascii", "ignore")
                            censusString = "Matched Heading"

                            if censusSubhead_decode != None:
                                time.sleep(1)
                                if censusString in censusSubhead_decode:
                                    print str(urlComplete) + " yes"
                                else:
                                    print str(urlComplete) + " no"

                        except IOError: # adding the error handling part so that internet breakage won't crash the program
                            time.sleep(60)
                            continue

                        loop5 += 1
                    loop4 += 1
                loop3 += 1
            loop2 += 1
        loop1 += 1

get_Search()

0 个答案:

没有答案