Question

我正在使用Anaconda和BeautifulSoup从站点上抓取数据。

import requests
resp = requests.get('https://www.url.com')
Weathertest = resp.text

from bs4 import BeautifulSoup
soup = BeautifulSoup(Weathertest,'lxml') 

mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('p',class_='weatherhistory_results_datavalue temp_mn')

我想做的是将某一天的最低温度调低。这是页面的html：

<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">&#176;F</span></p></td></tr>

在尝试了上述内容并得到[]的结果后，我意识到weatherhistory类不是p类，因此上述内容不起作用。相反，我尝试了：

mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('tr',class_='weatherhistory_results_datavalue temp_mn')

我得到的结果是上面的整个html字符串（从tr类到/ tr）。我试图找到如何从tr类中提取p值，但我没有提出任何建议。我对所有这些都还很陌生，所以我确定这很简单，我只是还不知道。

或者也许我需要一个复合语句，例如“在上面找到所有tr类，然后给我p值”，但是我不确定如何编写。

Answer 1

尝试一下：

>>>data = """<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">&#176;F</span></p></td></tr>"""

>>> from bs4 import BeautifulSoup
>>> soap = BeautifulSoup(data,"lxml")
>>> temp = soap.find_all("tr",{"class":"weatherhistory_results_datavalue temp_mn"})
>>> for i in temp:
        a = i.find("span",{"class": "value"})
        print(a.text)

Answer 2

您首先可以通过调用

提取类为“ weatherhistory_results_datavalue temp_mn”的tr。

trs = mintemp.find_all('tr', {'class':'weatherhistory_results_datavalue temp_mn'})

这将返回一组与特定类有关的事件。

之后，您可以遍历结果并使用

在每个结果中找到span标签。

for result in trs:
    temp_str = result.find('span', {'class':'value'}).text
    temp = float(temp_str) # convert temperature string to float

temp现在是一个包含温度的浮点数。

抓取与“ tr class”标签关联的“ p”标签值

2 个答案: