Question

我正试图让我的代码在这个网站的页面中递增，我似乎无法让它循环和增加，而是做第一页，然后放弃。有什么我做错了吗？

         if(pageExist is not None):
              if(countitup != pageNum):
                 countitup = countitup + 1
                 driver.get('http://800notes.com/Phone.aspx/%s/%s' % (tele800,countitup))
                 delay = 4
                 scamNum = soup.find_all(text=re.compile(r"Scam"))
                 spamNum = soup.find_all(text=re.compile(r"Call type: Telemarketer"))
                 debtNum = soup.find_all(text=re.compile(r"Call type: Debt Collector"))
                 hospitalNum = soup.find_all(text=re.compile(r"Hospital"))
                 scamCount = len(scamNum) + scamCount
                 spamCount = len(spamNum) + spamCount
                 debtCount = len(debtNum) + debtCount
                 hospitalCount = len(hospitalNum) + hospitalCount
                 block = soup.find(text=re.compile(r"OctoNet HTTP filter"))
                 extrablock = soup.find(text=re.compile(r"returning an unknown error"))
                 type(block) is str 
                 type(extrablock) is str 
                 if(block is not None or extrablock is not None):
                    print("\n Damn. Gimme an hour to fix this.")
                    time.sleep(2000)

回购：https://github.com/GarnetSunset/Haircuttery/tree/Experimental

Answer 1

pageExist is not None这似乎是个问题。因为它检查页面是否为None，并且很可能永远不会是none。没有正式的方法来检查HTTP响应，但我们可以使用类似的东西。

if (soup.find_element_by_xpath('/html/body/p'[contains(text(),'400')])
#this will check if there's a 400 code in the p tag.

或

if ('400' in soup.find_element_by_xpath('/html/body/p[1]').text)

我确信还有其他方法可以做到这一点，但这是其中之一，所以这是唯一的问题。然后，您可以在修复第一个 if 后立即增加或保留其余代码。

我可能在我的代码中犯了一些错误（语法），因为我没有测试它但逻辑适用），很棒的代码！

也代替

             type(block) is str 
             type(extrablock) is str

pythonic方式正在使用 isinstace

isinstance(block, str)
isinstance(extrablock, str)

和time.sleep您可以使用WebDriverWait，有两种可用的方法，隐式和显式等待，请查看here。

如何使用Selenium和BeautifulSoup在页面之间增加？

1 个答案: