Question

我通过https://www.wunderground.com/（搜索随机邮政编码）获取基本天气数据（如每日高/低温）来练习网页抓取。

我尝试了各种各样的代码，但它仍然会返回一个温度应该是的空列表。老实说，我不知道如何确定我出错的地方。有人能指出我正确的方向吗？

import requests
from bs4 import BeautifulSoup
response=requests.get('https://www.wunderground.com/cgi-bin/findweather/getForecast?query=76502')
response_data = BeautifulSoup(response.content, 'html.parser')
results=response_data.select("strong.high")

我还尝试过以下各种其他变体：

results = response_data.find_all('strong', class_ = 'high')
results = response_data.select('div.small_6 columns > strong.high' )

Answer 1

您要解析的数据是由JavaScript动态创建的，requests无法处理。您应该将selenium与PhantomJS或任何其他驱动程序一起使用。以下是使用selenium和Chromedriver的示例：

from selenium import webdriver
from bs4 import BeautifulSoup

url='https://www.wunderground.com/cgi-bin/findweather/getForecast?query=76502'
driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

使用以下方法检查元素，最低温度，最高温度和当前温度：

high = soup.find('strong', {'class':'high'}).text
low = soup.find('strong', {'class':'low'}).text
now = soup.find('span', {'data-variable':'temperature'}).find('span').text

>>> low, high, now
('25', '37', '36.5')

WebSscping与BeautifulSoup，获得空列表

1 个答案: