刮除所有内容,即使某些元素不存在

时间:2019-08-10 10:05:34

标签: python beautifulsoup

我遇到一个问题,如果不是所有元素都存在,我的脚本将跳过餐馆。我希望我的脚本抓取所有内容,并为所有不存在的元素保留“ N / A”。

整个代码:https://pastebin.com/af577pCM

response = requests.get("https://www.zomato.com/san-francisco/restaurants?page=807", headers=headers)
content = response.content
bs = BeautifulSoup(content, "html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})

for zomato_container in zomato_containers:

title = zomato_container.find("a", {"class": "result-title"}).get_text()

try:
    address = zomato_container.find("div", {"class": "search-result-address"}).get_text()
    if address is None:
        address = 'N/A'
    district = zomato_container.find("a", {"class": "search_result_subzone"}).get_text()
    if district is None:
        district = 'N/A'
    cost_for_two = zomato_container.select_one('[class*="col-s-11 col-m-12 pl0"]').get_text(separator='|', strip=True).split('|')
    cost_for_two = cost_for_two[1] if len(cost_for_two) > 1 else cost_for_two[0]
    if cost_for_two is None:
        cost_for_two = 'N/A'
    cuisines = zomato_container.find("div", {"class": "res-snippet-small-establishment mt5"}).get_text()
    if cuisines is None:
        cuisines = 'N/A'
    rating = zomato_container.select_one('.rating-popup').text.strip()
    if rating is None:
        rating = 'N/A'
    numVotes = zomato_container.select_one('[class^=rating-votes-div]').text  # match on elements with class attribute whose values starts with rating-votes-div
    if numVotes  is None:
        numVotes = 'N/A'

except AttributeError:
    continue

print("restaurant_title: ", title)
print("restaurant_address: ", address)
print("restaurant_district: ", district)
print("cost_for_two: ", cost_for_two)
print("restaurant_cuisines: ", cuisines)
print("rating: ", rating)
print("numVotes: ", numVotes)

Screenshot Example

1 个答案:

答案 0 :(得分:2)

有一种更好的方法可以使总体更简洁,但其想法只是尝试在自己的try块中捕获错误,而不是创建大块。 restos被跳过的原因是,一旦单个元素出错,它将立即转到continue。更好的方法是重写整个循环,如下所示:

for zomato_container in zomato_containers:

    title = zomato_container.find("a", {"class": "result-title"}).get_text()

    address = None
    district = None
    cost_for_two = None
    cuisines = None
    rating = None
    numVotes = None

    try:
        address = zomato_container.find("div", {"class": "search-result-address"}).get_text()
    except:
        address = 'N/A'

    try:
        district = zomato_container.find("a", {"class": "search_result_subzone"}).get_text()
    except:
        district = 'N/A'

    try:
        cost_for_two = zomato_container.select_one('[class*="col-s-11 col-m-12 pl0"]').get_text(separator='|', strip=True).split('|')
        cost_for_two = cost_for_two[1] if len(cost_for_two) > 1 else cost_for_two[0]
    except:
        cost_for_two = 'N/A'

    try:
        cuisines = zomato_container.find("div", {"class": "res-snippet-small-establishment mt5"}).get_text()
    except:
        cuisines = 'N/A'

    try:
        rating = zomato_container.select_one('.rating-popup').text.strip()
    except:
        rating = 'N/A'

    try:
        numVotes = zomato_container.select_one('[class^=rating-votes-div]').text
    except:
        numVotes = 'N/A'

    print("restaurant_title: ", title)
    print("restaurant_address: ", address)
    print("restaurant_district: ", district)
    print("cost_for_two: ", cost_for_two)
    print("restaurant_cuisines: ", cuisines)
    print("rating: ", rating)
    print("numVotes: ", numVotes)

    f.writerow([title, address, district, cost_for_two, cuisines, rating, numVotes])

这提供了预期的结果:

enter image description here

总体而言,最好的方法是编写一个函数,在搜索页面上的属性/元素时为您执行try-catch逻辑,并使用该函数使代码更简洁,逻辑更紧密(并且不会违反DRY原则)。