我正在尝试寻找帮助来修改beautifulsoup脚本,以从以下网站上抓取我需要的正确信息:https://www.skiresort.info/ski-resorts/europe。
这个不错的bs4工作是由bebebellamy完成的,可以在https://github.com/beaubellamy/SkiResortScraper/blob/master/SkiresortScraper/SkiresortScraper.py
获得。我的问题:除此脚本外,我还需要捕获电梯操作员信息(名称+街道地址+邮政编码+城市+电话+传真+电子邮件)和旅游局信息(名称+街道地址+邮政编码) + city + tel + fax + email)来自欧洲的每个滑雪胜地...但是我不知道在脚本中包括的方式和领域。 例如,该信息可在以下位置获得:
和/或
https://www.skiresort.info/ski-resort/belpiano-schoenebenmalga-san-valentino-haideralm/tourist-info/
我用来捕获这些信息的下面的模型未适应
示例操作代码包含在脚本中:
def get_report_scores(resortUrl):
"""
Print the resort report scores
"""
# Construct the url for the report.
reportUrl = resortUrl + "test-result/"
# Get the content of the report for the resort
reportContent = get_html_content(reportUrl)
# Get a list of all ski resorts on the current page
reportHtml = BeautifulSoup(reportContent, 'html.parser')
report = reportHtml.findAll("div", {"class": "stars-link-element"})
# rating dictionary
rating = {}
# Extract each score for each report attribute.
for item in report:
end = item['title'].find("out")
score = float(item['title'][0:end])
attribute = item.contents[5].text
#print(attribute,": ",score)
rating[attribute] = score
return rating
有人可以帮助我编写正确的代码,以从以下视图源获取升降机操作员信息:https://www.skiresort.info/ski-resort/belpiano-schoenebenmalga-san-valentino-haideralm/lift-operator/?
它可能像这样开始:
def get_report_scores(resortUrl):
"""
Print the contacts lift operator infos
"""
# Construct the url for the report.
reportUrl = resortUrl + "lift-operator/"
# Get the content of the report for the resort
reportContent = get_html_content(reportUrl)
# Get a list of all ski resorts on the current page
reportHtml = BeautifulSoup(reportContent, 'html.parser')
report = reportHtml.findAll("div", {"class": "................"})
# Extract each contact for each ski resort.
Operator_Name =
Street_address =
Street_address_postalcode =
Street_address_city =
Street_address_country =
operator_tel =
operator_fax =
operator_email =
我清除了吗??? 我希望是这样:))
非常感谢您的宝贵时间...