Question

我编写了代码块来搜索网页中的一些随机文本。该网页有多个标签，我使用selenium进行导航。这是我试图找到的文字未在特定页面中修复的问题。文本可以位于网页的任何选项卡中。如果未找到文本，则会引发异常。如果引发异常，则应转到下一个要搜索的选项卡。我在处理例外方面遇到了困难。

以下是我尝试的代码。

import requests
from bs4 import BeautifulSoup
import re
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.yxx.com/71463001")
a = driver.page_source
soup = BeautifulSoup(a, "html.parser")

try:
    head = soup.find_all("div", {"style":"overflow:hidden;max-height:25px"})
    head_str = str(head)
    z = re.search('B00.{7}', head_str).group(0)
    print z
    print 'header'
except AttributeError:
    g_info = soup.find_all("div", {"id":"details_readonly"})
    g_info1=str(g_info)
    x = re.search('B00.{7}', g_info1).group(0)
    print x
    print 'description'
except AttributeError:
    corre = driver.find_element_by_id("tab_correspondence")
    corre.click()
    corr_g_info = soup.find_all("table", {"id" : "correspondence_view"})
    corr_g_info1=str(corr_g_info)
    print corr_g_info
    y = re.search('B00.{7}', corr_g_info1).group(0)
    print y
    print 'correspondance'

当我运行此代码时，我得到了一个

error Traceback (most recent call last):
  File "C:\Python27\BS.py", line 21, in <module>
    x = re.search('B00.{7}', g_info1).group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Answer 1

您收到该错误是因为您在不包含任何内容的re.search对象上调用了群组。当我运行您的代码时，它失败了，因为您尝试连接的页面当前没有启动。

至于为什么except没有抓住它：你错误地只为一个except写了两个try s。 try只会在第一个AttributeError之前捕获代码的任何except。

通过将第19行更改为x = re.search('B00.{7}', g_info1)，代码会再次运行并返回None和description，因为该页面目前尚未启动。

或者，要实现我认为您的目标，嵌套try / except是一个选项：

try: head = soup.find_all("div", {"style":"overflow:hidden;max-height:25px"}) head_str = str(head) z = re.search('B00.{7}', head_str).group(0) print z print 'header' except AttributeError: try: g_info = soup.find_all("div", {"id":"details_readonly"}) g_info1=str(g_info) x = re.search('B00.{7}', g_info1) print x print 'description' except AttributeError: corre = driver.find_element_by_id("tab_correspondence") corre.click() corr_g_info = soup.find_all("table", {"id" : "correspondence_view"}) corr_g_info1=str(corr_g_info) print corr_g_info y = re.search('B00.{7}', corr_g_info1).group(0) print y print 'correspondance'

当然，此代码目前会抛出NameError，因为网站上没有用于定义corr_g_info变量的信息。

美丽的汤/ Python中的异常处理

1 个答案: