我已经使用Python和Beautiful Soup构建了一个Web scraper
有时存在某些元素,有时则不存在。我有很多。我为每个“find”和/或“find_all”设置自定义异常并没有意义
我只是想忽略这些错误,这样我的刮刀就不会在异常时停止。这是我终端的错误输出:
Traceback (most recent call last):
File "listing-scraper.py", line 80, in <module>
'engine_size':soup.find("span",{"id":"infoEngine Size"}).contents[0],
AttributeError: 'NoneType' object has no attribute 'contents'
我怎么才能继续?
这是我的源代码片段 - 所以你可以看到它是如何设置的。 (请好,我是Python的新手)
dealer_info = {
'name':dealer_box.find("h4").contents[0],
'address':dealer_address,
'phone':re.sub(r'[^\d.]+','',soup.find("div",{"class":"PhoneNumber"}).contents[0]),
'logo':soup.find("div",{"class":"dealerLogo"}).img['src'],
'about':dealer_about,
'website':website,
'video':dealer_video
}
thumbnails = soup.find("div",{"class":"imageThumbs"}).find_all('img')
dealer_thumbnails = []
for thumbnail in thumbnails:
dealer_thumbnails.append(thumbnail['src'])
motorcycle = {
'insert_date':time.time() * 1000,
'year':soup.find("span",{"id":"infoYear"}).contents[0],
'make':soup.find("span",{"id":"infoMake"}).contents[0],
'model':soup.find("span",{"id":"infoModel"}).contents[0],
'type':soup.find("span",{"id":"infoType"}).contents[0],
'location':soup.find("span",{"id":"infoLocation"}).contents[0],
'color':soup.find("span",{"id":"infoColor"}).contents[0],
'engine_size':soup.find("span",{"id":"infoEngine Size"}).contents[0],
'description':description,
'price':soup.find("h3",{"class":"askingPriceNumber"}).contents[1],
'thumbnails':dealer_thumbnails,
'dealer_info':dealer_info
}
listing.update(motorcycle)
答案 0 :(得分:2)
考虑类似的事情:
def getcontents(item, index):
if item is None:
return None
return item.contents[index]
motorcycle = {
'insert_date':time.time() * 1000,
'year':getcontents(soup.find("span",{"id":"infoYear"}), 0),
...
通常情况下,如果您可以避免首先导致异常,则不应忽略该异常。