Question

我正在制作家庭作业的天气预报程序，需要打印：

Today's temperatures: maximum 2ºC, minimum -1ºC

目前打印出来：

Today's temperatures:      <title>Thursday: Light Snow Shower, Maximum 
Temperature: 2Â°C (36Â°F) Minimum Temperature: -1Â°C (30Â°F)</title>.

如何确保只打印正确的信息？这是我的代码：

import urllib

url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_connection = urllib.urlopen(url)

for line in web_connection.readlines():
    if line.find('Thursday:') != -1:
        print "Today's temperatures:" + line

web_connection.close()

Answer 1

您可以使用正则表达式来执行此操作

import re

TEMP_REGEX = "^.*Maximum\s+Temperature:\s+(?P<max>([+-]?[0-9]*[\.,]?[0-9]*)).*Minimum\s+Temperature:\s+(?P<min>([+-]?[0-9]*[\.,]?[0-9]*)).*$"

matched = re.match(TEMP_REGEX, line)

if matched:
    max = matched.groupdict()["max"]
    min = matched.groupdict()["min"]

.....

Answer 2

正确的方法是解析RSS格式的RSS文件。您可以从研究XML模块文档here开始。这是一个小代码片段，可以帮助您入门：

import urllib
from xml.etree import ElementTree as ET

url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_conn = urllib.urlopen(url)
rss = web_conn.read()
web_conn.close()

weather_data = ET.fromstring(rss)
for node in weather_data.iter():
    if node.tag == "item":
        title = node.find("title").text
        if title.find("Thursday") != -1:
            todays_weather = node.find("description").text.split(',')
            for entry in todays_weather:
                print entry.strip()

输出：

Maximum Temperature: 2°C (36°F)
Minimum Temperature: -1°C (30°F)
Wind Direction: Westerly
Wind Speed: 6mph
Visibility: Very Good
Pressure: 977mb
Humidity: 87%
UV Risk: 1
Pollution: Low
Sunrise: 07:59 GMT
Sunset: 16:42 GMT

如何以及为何？ 如果您在浏览器中打开RSS文件，您将看到它是XML格式的，这意味着它具有特定的结构。查看这些信息，您会发现每天的预测都包含在<item>中，其中包含<title>和<description>等信息。通过使用XML解析器，您将能够使用直观的方法轻松浏览结构。find()，.findall()并使用.text属性访问数据。

Answer 3

你有三个问题需要解决，首先找到一周中的第二天的名字，找到最小和最大临时的第三行，第三个解析这些临时值。我认为这应该有效：

import urllib
import re

url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_connection = urllib.urlopen(url)

for line in web_connection.readlines():
    day_of_the_week = time.strftime("%A")
    if '<title>'+ day_of_the_week +':' in line:
        m = re.match('.+Maximum Temperature:\s(.+)°C.+Minimum Temperature:\s(.+)°C.+', line)
        max_temp = m.group(1)
        min_temp = m.group(2)
print("Today's temperatures: maximum " + max_temp + "°C, minimum " + min_temp + "°C")

web_connection.close()

因此，要查看一周中的某一天，请查看https://docs.python.org/2/library/time.html#time.strftime

然后我和你一样找到了正确的行（只是使用了Python的＆＃39; in＆＃39;语句）

之后我使用了一个带有组的正则表达式来解析数字（和符号！）。为了帮助您进行正则表达式设计，您可以尝试https://regex101.com/#python

玩得开心！

如何从链接中拉出然后打印某些单词

3 个答案: