在我写的用于抓取一些数据的脚本中,有一大堆看起来像这样的代码:
try:
prize = row.find_element(By.XPATH, './div[contains(@class, "divCell")][3]').text
except:
prize = ''
try:
field = row.find_element(By.XPATH, './div[contains(@class, "divCell")][4]').text
except:
field = ''
try:
country = row.find_element(By.XPATH, './div[contains(@class, "divCell")][5]/span[1]/a').get_attribute('title')
except:
country = ''
try:
city = row.find_element(By.XPATH, './div[contains(@class, "divCell")][5]/span[2]').text
except:
city = ''
try:
winner = row.find_element(By.XPATH, './div[contains(@class, "divCell")][6]/span[2]/span').get_attribute('data-highlightingclass')
except:
winner = ''
try:
runnerup = row.find_element(By.XPATH, './div[contains(@class, "divCell")][7]/span[2]/span').get_attribute('data-highlightingclass')
except:
runnerup = ''
我是Python的新手,想知道是否有其他替代方法或更简洁的方法来实现这一目标?
答案 0 :(得分:0)
序言:请提供Minimal, Complete, and Verifiable example以帮助我们为您提供帮助。
我将假定您正在使用Selenium。
您在这里有不同的选择。
如果所有元素都是强制性的,则最好使用一个更大的try-catch:
try:
prize = row.find_element_by_xpath('./div[contains(@class, "divCell")][3]').text
field = row.find_element_by_xpath('./div[contains(@class, "divCell")][4]').text
country = row.find_element_by_xpath('./div[contains(@class, "divCell")][5]/span[1]/a').get_attribute('title')
...
except NoSuchElementException:
# Do something smart
(请注意,Selenium文档recommends to使用方法WebDriver.find_element_by_xpath
而不是直接使用WebDriver.find_element
。)
(由@vks推荐。)
可以使用环绕方法并返回None
来代替直接使用引发异常的方法:
def find_element_by_xpath_or_None(haystack, xpath):
try:
return haystack.find_element_by_xpath(xpath)
except NoSuchElementException:
return None
然后按如下方式使用它:
prize = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][3]')
prize = prize.text if prize else ''
field = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][4]')
field = prize.text if prize else ''
country = find_element_by_xpath_or_None(row, './div[contains(@class, "divCell")][5]/span[1]/a')
country = country.get_attribute('title') if country else ''
编辑:也可用于lambda。
您甚至可以通过使用lambda明确声明要提取的内容来使其更精简:
def find_element_by_xpath_or_None(haystack, xpath, access_fun):
try:
return access_fun(haystack.find_element_by_xpath(xpath))
except NoSuchElementException:
return None
并且:
field = find_element_by_xpath_or_None(
row, './div[contains(@class, "divCell")][4]',
lambda e: e.text
) or ''
country = find_element_by_xpath_or_None(
row, './div[contains(@class, "divCell")][5]/span[1]/a',
lambda e: e.get_attribute('title')
) or ''