在解析html时在python中设置一个简单的重试函数

时间:2017-11-10 08:16:42

标签: python html loops parsing

我试图让这个功能再次运行,如果它没有在页面上找到信息。

我认为这将是一个解决方案,但它不起作用。我不确定如何通过简单的功能实现刮擦循环。我尝试使用重试模块,但它在安装时遇到问题,因此硬编码解决方案将是理想的。

我的代码在

下面
import time, requests, webbrowser, sys, os, re, json
from bs4 import BeautifulSoup
from colorama import Fore, Back, Style, init
import subprocess as s

url = "http://notimportant.com"

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

def getIds():
    global product_id
    for script in scripts:
        if 'spConfig =' in script.getText():
            #idlive = True
            regex = re.compile(r'var spConfig = new Product.Config\((.*?)\);')
            match = regex.search(script.getText())
            spConfig = json.loads(match.groups()[0])
            for key, attribute in spConfig['attributes'].iteritems():
                for option in attribute['options']:
                    if option['label_uk'] == size:
                        label = option['label_uk'].strip()
                        for product_id in option['products']:
                            print(Fore.CYAN + "Size Found!")
                            print product_id, "-", label
                            #str = product_id
                            #productsizeid = str
        else:
            print(Fore.RED + "Sizes not live yet")
            print("Retrying in 10 seconds . . .")
            time.sleep(10)
            print("Trying again. . .")
            getIds()

1 个答案:

答案 0 :(得分:0)

迭代将是首选方法 类似的东西:

url = "http://notimportant.com"
size_alive  = false
while not size_alive:
           do_the_scraping_function(#the function should set size_alive=true when it finds spConfig =' in script.getText())
           print("retrying in 10 seconds")
           time.sleep(10)