在多个页面中的多个级别抓取信息

时间:2019-05-28 07:39:07

标签: python python-3.x beautifulsoup code-formatting

我正在尝试寻找帮助来修改beautifulsoup脚本,以从以下网站上抓取我需要的正确信息:https://www.skiresort.info/ski-resorts/europe

这个不错的bs4工作是由bebebellamy完成的,可以在https://github.com/beaubellamy/SkiResortScraper/blob/master/SkiresortScraper/SkiresortScraper.py

获得。

我的问题:除此脚本外,我还需要捕获电梯操作员信息(名称+街道地址+邮政编码+城市+电话+传真+电子邮件)和旅游局信息(名称+街道地址+邮政编码) + city + tel + fax + email)来自欧洲的每个滑雪胜地...但是我不知道在脚本中包括的方式和领域。 例如,该信息可在以下位置获得:

https://www.skiresort.info/ski-resort/belpiano-schoenebenmalga-san-valentino-haideralm/lift-operator/

和/或

https://www.skiresort.info/ski-resort/belpiano-schoenebenmalga-san-valentino-haideralm/tourist-info/

我用来捕获这些信息的下面的模型未适应

示例操作代码包含在脚本中:

def get_report_scores(resortUrl):
    """
    Print the resort report scores
    """
    # Construct the url for the report.
    reportUrl = resortUrl + "test-result/"

    # Get the content of the report for the resort
    reportContent = get_html_content(reportUrl)

    # Get a list of all ski resorts on the current page
    reportHtml = BeautifulSoup(reportContent, 'html.parser')
    report = reportHtml.findAll("div", {"class": "stars-link-element"})

    # rating dictionary
    rating = {}

    # Extract each score for each report attribute.
    for item in report:
        end = item['title'].find("out")
        score = float(item['title'][0:end])
        attribute = item.contents[5].text

        #print(attribute,": ",score)
        rating[attribute] = score

    return rating

有人可以帮助我编写正确的代码,以从以下视图源获取升降机操作员信息:https://www.skiresort.info/ski-resort/belpiano-schoenebenmalga-san-valentino-haideralm/lift-operator/

它可能像这样开始:

def get_report_scores(resortUrl):
    """
    Print the contacts lift operator infos
    """
    # Construct the url for the report.
    reportUrl = resortUrl + "lift-operator/"

    # Get the content of the report for the resort
    reportContent = get_html_content(reportUrl)

    # Get a list of all ski resorts on the current page
    reportHtml = BeautifulSoup(reportContent, 'html.parser')
    report = reportHtml.findAll("div", {"class": "................"})


    # Extract each contact for each ski resort.
        Operator_Name = 
        Street_address =
        Street_address_postalcode =
        Street_address_city =
        Street_address_country =
        operator_tel =
        operator_fax =
        operator_email =

我清除了吗??? 我希望是这样:))

非常感谢您的宝贵时间...

0 个答案:

没有答案