Beaufiulsoup:AttributeError:'NoneType'对象没有属性'text'并且不可下标

时间:2019-07-24 07:52:06

标签: beautifulsoup

我正在使用漂亮的汤,但出现错误,“ AttributeError:'NoneType'对象没有属性'get_text'”,还有“ TypeError:'NoneType'对象不可下标”。

我知道我的代码可以在我搜索单个餐厅时起作用。但是,当我尝试遍历所有餐厅时,就会收到错误消息。

这是我的屏幕录像,显示了该问题。 https://streamable.com/pok13

其余代码可在此处找到:https://pastebin.com/wsv1kfNm

# AttributeError: 'NoneType' object has no attribute 'get_text'
restaurant_address = yelp_containers[yelp_container].find("address", {
  "class": 'lemon--address__373c0__2sPac'
}).get_text()
print("restaurant_address: ", restaurant_address)




# TypeError: 'NoneType' object is not subscriptable
restaurant_starCount = yelp_containers[yelp_container].find("div", {
  "class": "lemon--div__373c0__1mboc i-stars__373c0__30xVZ i-stars--regular-4__373c0__2R5IO border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I"
})['aria-label']
print("restaurant_starCount: ", restaurant_starCount)



# AttributeError: 'NoneType' object has no attribute 'text'
restaurant_district = yelp_containers[yelp_container].find("div", {
  "class": "lemon--div__373c0__1mboc display--inline-block__373c0__25zhW border-color--default__373c0__2xHhl"
}).text
print("restaurant_district: ", restaurant_district)

1 个答案:

答案 0 :(得分:0)

由于选择器过于具体,导致您收到错误消息,而不检查是否找到了标签。一种解决方案是松开选择器(无论如何,lemon--div-XXX...选择器在不久的将来可能会更改):

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import csv
import re

my_url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San%20Francisco%2C%20CA'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
bs = soup(page_html, "html.parser")

yelp_containers = bs.select('li:contains("All Results") ~ li:contains("read more")')

for idx, item in enumerate(yelp_containers, 1):
    print("--- Restaurant number #", idx)

    restaurant_title = item.h3.get_text(strip=True)
    restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
    restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
    restaurant_numReview = item.select_one('[class*="reviewCount"]').get_text(strip=True)
    restaurant_numReview = re.sub(r'[^\d.]', '', restaurant_numReview)
    restaurant_starCount = item.select_one('[class*="stars"][aria-label]')['aria-label']
    restaurant_starCount = re.sub(r'[^\d.]', '', restaurant_starCount)
    pr = item.select_one('[class*="priceRange"]')
    restaurant_price = pr.get_text(strip=True) if pr else '-'
    restaurant_category = [a.get_text(strip=True) for a in item.select('[class*="priceRange"] ~ span a')]
    restaurant_district = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[-1]

    print(restaurant_title)
    print(restaurant_address)
    print(restaurant_numReview)
    print(restaurant_price)
    print(restaurant_category)
    print(restaurant_district)
    print('-' * 80)

打印:

--- Restaurant number # 1
Fog Harbor Fish House
Pier 39
5487
$$
['Seafood', 'Bars']
Fisherman's Wharf
--------------------------------------------------------------------------------
--- Restaurant number # 2
The House
1230 Grant Ave
4637
$$$
['Asian Fusion']
North Beach/Telegraph Hill
--------------------------------------------------------------------------------

...and so on.