使用BeautifulSoup访问aria标签和yelp的评论

时间:2020-10-08 17:58:34

标签: python beautifulsoup

我正在尝试访问每个评论者的评论和星级,并将这些值附加到列表中。但是,它不允许我重新调整输出。谁能告诉我我的代码有什么问题吗?

l=[]
for i in range(0,len(allrev)):
    try:
        l["stars"]=allrev[i].allrev.find("div",{"class":"lemon--div__373c0__1mboc i-stars__373c0__1T6rz i-stars--regular-4__373c0__2YrSK border-color--default__373c0__3-ifU overflow--hidden__373c0__2y4YK"}).get('aria-label')
    except:
        l["stars"]= None
    try:
        l["review"]=allrev[i].find("span",{"class":"lemon--span__373c0__3997G raw__373c0__3rKqk"}).text
    except:
        l["review"]=None
    
        

u.append(l)
l={}
print({"data":u})

1 个答案:

答案 0 :(得分:1)

要获得所有评论,您可以尝试以下操作:

import requests
from bs4 import BeautifulSoup

URL = "https://www.yelp.com/biz/sushi-yasaka-new-york"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

for star, review in zip(
    soup.select(
        ".margin-b1__373c0__1khoT .border-color--default__373c0__3-ifU .border-color--default__373c0__3-ifU .border-color--default__373c0__3-ifU .overflow--hidden__373c0__2y4YK"
    ),
    soup.select(".comment__373c0__3EKjH .raw__373c0__3rcx7"),
):
    print(star.get("aria-label"))
    print(review.text)
    print("-" * 50)

输出:

5 star rating
I've been craving sushi for weeks now and Sushi Yasaka hit the spot for me. Their lunch prices are unbeatable. Their lunch specials seem to extend through weekends which is also amazing.I got the Miyabi lunch as take out and ate in along the benches near the MTA. It came with 4 nigiri, 7 sashimi and you get to pick the other roll (6 pieces). It also came with a side (choose salad or soup, add $1 for both). It was an incredible deal for only $20. I was so full and happy! The fish tasted very fresh with wonderful flavor. I ordered right as they opened and there were at least 10 people waiting outside when I picked up my food so I imagine there is high turnover, keeping the seafood fresh. This will be a regular splurge lunch spot for sure.
--------------------------------------------------
5 star rating
If you're looking for great sushi on Manhattan's upper west side, head over to Sushi Yakasa ! Best sushi lunch specials, especially for sashimi. I ordered the Miyabi - it included a fresh oyster ! The oyster was delicious, served raw on the half shell. The sashimi was delicious too. The portion size was very good for the area, which tends to be a pricey neighborhood. The restaurant is located on a busy street (west 72nd) & it was packed when I dropped by around lunchtimeStill, they handled my order with ease & had it ready quickly. Streamlined service & highly professional. It's a popular sushi place for a reason. Every piece of sashimi was perfect. The salmon avocado roll was delicious too. Very high quality for the price. Highly recommend! Update - I've ordered from Sushi Yasaka a few times since the pandemic & it's just as good as it was before. Fresh, and they always get my order correct. I like their takeout system - you can order over the phone (no app required) & they text you when it's ready. Home delivery is also available & very reliable. One of my favorite restaurants- I'm so glad they're still in business !
--------------------------------------------------
...
...

编辑仅获得前100条评论:

import csv
import requests
from bs4 import BeautifulSoup

url = "https://www.yelp.com/biz/sushi-yasaka-new-york?start={}"
offset = 0
review_count = 0

with open("output.csv", "a", encoding="utf-8") as f:
    csv_writer = csv.writer(f, delimiter="\t")
    csv_writer.writerow(["rating", "review"])
    
    while True:
        resp = requests.get(url.format(offset))
        soup = BeautifulSoup(resp.content, "html.parser")

        for rating, review in zip(
            soup.select(
                ".margin-b1__373c0__1khoT .border-color--default__373c0__3-ifU .border-color--default__373c0__3-ifU .border-color--default__373c0__3-ifU .overflow--hidden__373c0__2y4YK"
            ),
            soup.select(".comment__373c0__3EKjH .raw__373c0__3rcx7"),
        ):
            print(f"review # {review_count}. link: {resp.url}")
            csv_writer.writerow([rating.get("aria-label"), review.text])

            review_count += 1
            if review_count > 100:
                raise Exception("Exceeded 100 reviews.")

        offset += 20