我是网络抓取的新手。我试图获得一些 pub_ratings。我也想从 yelp 页面获取尽可能多的数据。
这是我的代码:
pub_ratings = []
pub_reviews = []
pub_names = []
num_reviews = []
#for loop for all pages
for i in range(0,240,10):
url = "https://www.yelp.ie/search?find_desc=Pubs+%26+Bars&find_loc=london&ns=1&start={}".format(i)
r = requests.get(url)
soup_240 = BeautifulSoup(r.content, 'html.parser')
sleep(1)
all_data = soup_240.findAll('div', class_="container__09f24__21w3G hoverable__09f24__2nTf3 margin-t3__09f24__5bM2Z margin-b3__09f24__1DQ9x padding-t3__09f24__-R_5x padding-r3__09f24__1pBFG padding-b3__09f24__1vW6j padding-l3__09f24__1yCJf border--top__09f24__8W8ca border--right__09f24__1u7Gt border--bottom__09f24__xdij8 border--left__09f24__rwKIa border-color--default__09f24__1eOdn")
#filling them with data
for data in all_data:
pub_names.append(data.find('a', class_='css-166la90').get_text(separator=' '))
num_reviews.append(data.find('span',class_='reviewCount__09f24__EUXPN css-e81eai').get_text(separator=' '))
pub_ratings.append(data.find('div', aria_label="").get_text(separator=' '))
这是我的错误
<块引用>AttributeError: 'NoneType' 对象没有属性 'get_text'
答案 0 :(得分:0)
数据以 Json 形式嵌入页面中。要解析它,您可以使用下一个示例:
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.yelp.ie/search?find_desc=Pubs+%26+Bars&find_loc=london&ns=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = BeautifulSoup(
soup.select_one('script[type="application/json"]').contents[0],
"html.parser",
).contents[0]
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))
def search_biz(d):
if isinstance(d, dict):
if "bizId" in d:
yield d["searchResultBusiness"]
else:
for v in d.values():
yield from search_biz(v)
elif isinstance(d, list):
for v in d:
yield from search_biz(v)
for b in search_biz(data):
print(b["name"])
print(
"Rating: {}\nAddress: {}\nPhone: {}\n".format(
b["rating"], b["formattedAddress"], b["phone"]
)
)
打印:
The Harp
Rating: 4.5
Address: 47 Chandos Place
Phone: 020 7836 0291
Cahoots Bar
Rating: 4.5
Address: 13 Kingly Court
Phone: 020 7352 6200
The Monkey Puzzle
Rating: 4.5
Address: 30 Southwick Street
Phone: 020 7723 0143
The Crobar
Rating: 4.5
Address: 17 Manette Street
Phone: 020 7439 0831
The Queen’s Head
Rating: 4
Address: 15 Denman Street
Phone: 020 7437 1540
The Queens Arms
Rating: 4.5
Address: 11 Warwick Way
Phone: 020 7834 3313
The Cauldron
Rating: 4.5
Address: 79 Stoke Newignton Road
Phone: 0117 456 2442
Coach and Horses
Rating: 4
Address: 5 Bruton Street
Phone: 020 7629 4123
The Victoria
Rating: 4.5
Address: 10a Strathearn Place
Phone: 020 7724 1191
The Ordnance
Rating: 4
Address: 29 Ordnance Hill
Phone: 020 7722 0278