Question

我一般对RegEx（和Python）还是陌生的，我正试图使用它通过网站的HTML标签读取温度和天气描述。

我试图重做一些我在课堂上所展示的内容的示例，并在线阅读以做到这一点。

url = 'https://weather.com/en-AU/weather/today/l/-27.47,153.02'
contents = urllib.request.urlopen(url).read().decode("utf-8")

start_of_div = contents.find('<div class="today_nowcard-phrase">') # start of phrase line
end_of_div = start_of_div + contents[start_of_div:].find("</div>") + 6 # close of phrase line

phrase_area = contents[start_of_div:end_of_div]
print(phrase_area)

phrase = phrase_area.rfind(r'>(.*)<') # regex tester says this works
print(phrase)

然后是另一个获得学位的部分，该学位使用相同的布局。它应打印一个短语，例如“晴天”或“小雨”或其他天气，以及当前的度数（摄氏度）。而是打印出来：

<div class="today_nowcard-phrase">Sunny</div>
- 1
<div class="today_nowcard-temp"><span class="">21<sup>
- 1

应为“ Sunny”和“ 21”（此时不是-1）。将RegEx放入RegEx测试站点时可以使用它，但不能在我的实际程序中使用（可能是因为我看不到一些明显的错误）。任何帮助将不胜感激。

Answer 1

如评论中所述，使用了html解析器。所有元素都有很好的独特类名，您可以使用例如.today_nowcard-temp（其中前导.是一个与元素类名称匹配的css类选择器）

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://weather.com/en-AU/weather/today/l/-27.47,153.02')
soup = bs(r.content, 'html.parser')
temp = soup.select_one('.today_nowcard-temp').text
desc = soup.select_one('.today_nowcard-phrase').text
print(temp, desc)

在HTML之间读取的正则表达式在RegEx测试器中有效，但在我的代码中无效

1 个答案: