Question

在我写的用于抓取一些数据的脚本中，有一大堆看起来像这样的代码：

try:
  prize = row.find_element(By.XPATH, './div[contains(@class, "divCell")][3]').text
except:
  prize = ''
try:
  field = row.find_element(By.XPATH, './div[contains(@class, "divCell")][4]').text
except:
  field = ''
try:
  country = row.find_element(By.XPATH, './div[contains(@class, "divCell")][5]/span[1]/a').get_attribute('title')
except:
  country = ''
try:
  city = row.find_element(By.XPATH, './div[contains(@class, "divCell")][5]/span[2]').text
except:
  city = ''
try:
  winner = row.find_element(By.XPATH, './div[contains(@class, "divCell")][6]/span[2]/span').get_attribute('data-highlightingclass')
except:
  winner = ''
try:
  runnerup = row.find_element(By.XPATH, './div[contains(@class, "divCell")][7]/span[2]/span').get_attribute('data-highlightingclass')
except:
  runnerup = ''

我是Python的新手，想知道是否有其他替代方法或更简洁的方法来实现这一目标？

Answer 1

序言：请提供Minimal, Complete, and Verifiable example以帮助我们为您提供帮助。

我将假定您正在使用Selenium。

您在这里有不同的选择。

一个子句将它们全部捕获

如果所有元素都是强制性的，则最好使用一个更大的try-catch：

try:
    prize = row.find_element_by_xpath('./div[contains(@class, "divCell")][3]').text
    field = row.find_element_by_xpath('./div[contains(@class, "divCell")][4]').text
    country = row.find_element_by_xpath('./div[contains(@class, "divCell")][5]/span[1]/a').get_attribute('title')
    ...
except NoSuchElementException:
    # Do something smart

（请注意，Selenium文档recommends to使用方法WebDriver.find_element_by_xpath而不是直接使用WebDriver.find_element。）

封装

（由@vks推荐。）

可以使用环绕方法并返回None来代替直接使用引发异常的方法：

def find_element_by_xpath_or_None(haystack, xpath):
    try:
        return haystack.find_element_by_xpath(xpath)
    except NoSuchElementException:
        return None

然后按如下方式使用它：

prize = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][3]')
prize = prize.text if prize else ''

field = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][4]')
field = prize.text if prize else ''

country = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][5]/span[1]/a')
country = country.get_attribute('title') if country else ''

编辑：也可用于lambda。

完全封装

您甚至可以通过使用lambda明确声明要提取的内容来使其更精简：

def find_element_by_xpath_or_None(haystack, xpath, access_fun):
    try:
        return access_fun(haystack.find_element_by_xpath(xpath))
    except NoSuchElementException:
        return None

并且：

field = find_element_by_xpath_or_None(
    row, './div[contains(@class, "divCell")][4]',
    lambda e: e.text
) or ''

country = find_element_by_xpath_or_None(
    row, './div[contains(@class, "divCell")][5]/span[1]/a',
    lambda e: e.get_attribute('title')
) or ''

重复尝试除外块的替代方法

1 个答案:

一个子句将它们全部捕获

封装

完全封装