我遇到一个问题,如果不是所有元素都存在,我的脚本将跳过餐馆。我希望我的脚本抓取所有内容,并为所有不存在的元素保留“ N / A”。
整个代码:https://pastebin.com/af577pCM
response = requests.get("https://www.zomato.com/san-francisco/restaurants?page=807", headers=headers)
content = response.content
bs = BeautifulSoup(content, "html.parser")
zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})
for zomato_container in zomato_containers:
title = zomato_container.find("a", {"class": "result-title"}).get_text()
try:
address = zomato_container.find("div", {"class": "search-result-address"}).get_text()
if address is None:
address = 'N/A'
district = zomato_container.find("a", {"class": "search_result_subzone"}).get_text()
if district is None:
district = 'N/A'
cost_for_two = zomato_container.select_one('[class*="col-s-11 col-m-12 pl0"]').get_text(separator='|', strip=True).split('|')
cost_for_two = cost_for_two[1] if len(cost_for_two) > 1 else cost_for_two[0]
if cost_for_two is None:
cost_for_two = 'N/A'
cuisines = zomato_container.find("div", {"class": "res-snippet-small-establishment mt5"}).get_text()
if cuisines is None:
cuisines = 'N/A'
rating = zomato_container.select_one('.rating-popup').text.strip()
if rating is None:
rating = 'N/A'
numVotes = zomato_container.select_one('[class^=rating-votes-div]').text # match on elements with class attribute whose values starts with rating-votes-div
if numVotes is None:
numVotes = 'N/A'
except AttributeError:
continue
print("restaurant_title: ", title)
print("restaurant_address: ", address)
print("restaurant_district: ", district)
print("cost_for_two: ", cost_for_two)
print("restaurant_cuisines: ", cuisines)
print("rating: ", rating)
print("numVotes: ", numVotes)
答案 0 :(得分:2)
有一种更好的方法可以使总体更简洁,但其想法只是尝试在自己的try
块中捕获错误,而不是创建大块。 restos被跳过的原因是,一旦单个元素出错,它将立即转到continue
。更好的方法是重写整个循环,如下所示:
for zomato_container in zomato_containers:
title = zomato_container.find("a", {"class": "result-title"}).get_text()
address = None
district = None
cost_for_two = None
cuisines = None
rating = None
numVotes = None
try:
address = zomato_container.find("div", {"class": "search-result-address"}).get_text()
except:
address = 'N/A'
try:
district = zomato_container.find("a", {"class": "search_result_subzone"}).get_text()
except:
district = 'N/A'
try:
cost_for_two = zomato_container.select_one('[class*="col-s-11 col-m-12 pl0"]').get_text(separator='|', strip=True).split('|')
cost_for_two = cost_for_two[1] if len(cost_for_two) > 1 else cost_for_two[0]
except:
cost_for_two = 'N/A'
try:
cuisines = zomato_container.find("div", {"class": "res-snippet-small-establishment mt5"}).get_text()
except:
cuisines = 'N/A'
try:
rating = zomato_container.select_one('.rating-popup').text.strip()
except:
rating = 'N/A'
try:
numVotes = zomato_container.select_one('[class^=rating-votes-div]').text
except:
numVotes = 'N/A'
print("restaurant_title: ", title)
print("restaurant_address: ", address)
print("restaurant_district: ", district)
print("cost_for_two: ", cost_for_two)
print("restaurant_cuisines: ", cuisines)
print("rating: ", rating)
print("numVotes: ", numVotes)
f.writerow([title, address, district, cost_for_two, cuisines, rating, numVotes])
这提供了预期的结果:
总体而言,最好的方法是编写一个函数,在搜索页面上的属性/元素时为您执行try-catch
逻辑,并使用该函数使代码更简洁,逻辑更紧密(并且不会违反DRY原则)。