美丽的汤复杂的ui li标签

时间:2018-08-21 01:09:35

标签: python-3.x web-scraping

数据在网站上以这种格式提供

Complex Data in li and ui

我需要以这种格式提取数据

[["customer1","Windermere, FL", "This is comment of customer1", "1star", 5/5/2018],
["customer2", "Windermere, FL", "This is comment of customer2", "1star", 5/ 5/2018]
            ]

请帮助。

1 个答案:

答案 0 :(得分:1)

from bs4 import BeautifulSoup
import requests

my_url = "https://www.yelp.com/biz/burger-21-orlando?osq=burger"
my_html = requests.get(my_url)

my_soup = BeautifulSoup(my_html.text, "html.parser")

outer_div = my_soup.find(class_="ylist ylist-bordered reviews")


All_inner_div_list = outer_div.findAll(class_="review review--with-sidebar")

for record in All_inner_div_list:
    try:
        name = record.find(class_="user-display-name js-analytics-click")
    except (AttributeError, KeyError, TypeError) as ex:
        name = "BlankValue"

    try:
        location = record.find(class_="user-location responsive-hidden-small")
    except (AttributeError, KeyError, TypeError) as ex:
        location = "BlankValue"

    try:
        date = record.find(class_="biz-rating biz-rating-large clearfix")
    except (AttributeError, KeyError, TypeError) as ex:
        date = "BlankValue"

    try:
        review = record.find('p')
    except (AttributeError, KeyError, TypeError) as ex:
        review = "BlankValue"

    print("Name: {}".format(name.text.strip()))
    print("Location: {}".format(location.text.strip()))
    print("Date: {}".format(date.text.strip()[0:10].rstrip("\n\r")))
    print("Review: {}".format(review.text.strip()))
    print("______________________________________________________")