我仍然非常擅长Python,但我正在尝试编写解析NOAA天气的代码,并按照我们的无线电广播顺序显示。
我设法将一个使用python表达式的当前条件列表放在一起,其中html文件被切割成一个行列表,然后以正确的顺序重新输出,但每个都是单个数据线。该代码看起来像这样:
#other function downloads
#http://www.arh.noaa.gov/wmofcst_pf.php?wmo=ASAK48PAFC&type=public
#and renames it currents.html
from bs4 import BeautifulSoup as bs
import re
soup = bs(open('currents.html')
weatherRaw = soup.pre.string
towns = ['PAOM', 'PAUN', 'PAGM', 'PASA']
townOut = []
weatherLines = weatherRaw.splitlines()
for i in range(len(towns)):
p = re.compile(towns[i] + '.*')
for line in weatherLines:
matched = p.match(line)
if matched:
townOut.append(matched.group())
现在我正在处理预测部分,我遇到了一个问题,因为每个预测都必须在多行上运行,并且我已经将文件切成了一个行列表。
所以:我正在寻找的是一个表达式,它允许我使用类似的循环,这次开始追加到找到的行并在一行只包含&&amp ;.像这样:
#sample data from http://www.arh.noaa.gov/wmofcst.php?wmo=FPAK52PAFG&type=public
#BeautifulSouped into list fcst (forecast.pre.get_text().splitlines())
zones = ['AKZ214', 'AKZ215', 'AKZ213'] #note the out-of-numerical-order zones
weatherFull = []
for i in range(len(zones)):
start = re.compile(zones[i] '.*')
end = re.compile('&&')
for line in fcst:
matched = start.match(line)
if matched:
weatherFull.append(matched.group())
#and the other lines of various contents and length
#until reaching the end match object
我该怎么做才能改进这段代码?我知道它非常冗长,但是当我开始时,我喜欢能够跟踪我在做什么。提前谢谢!
答案 0 :(得分:0)
道歉,如果这不是你所追求的(在这种情况下,很乐意调整)。很棒你正在使用BeautifulSoup,但实际上你可以更进一步。查看HTML,似乎每个块都以<a name=zone>
结构开头,并在下一个<a name=zone>
结束。在这种情况下,您可以执行以下操作来为每个区域提取相应的HTML:
from bs4 import BeautifulSoup
# I put the HTML in a file, but this will work with a URL as well
with open('weather.html', 'r') as f:
fcst = f.read()
# Turn the html into a navigable soup object
soup = BeautifulSoup(fcst)
# Define your zones
zones = ['AKZ214', 'AKZ215', 'AKZ213']
weatherFull = []
# This is a more Pythonic loop structure - instead of looping over
# a range of len(zones), simply iterate over each element itself
for zone in zones:
# Here we use BS's built-in 'find' function to find the 'a' element
# with a name = the zone in question (as this is the pattern).
zone_node = soup.find('a', {'name': zone})
# This loop will continue to cycle through the elements after the 'a'
# tag until it hits another 'a' (this is highly structure dependent :) )
while True:
weatherFull.append(zone_node)
# Set the tag node = to the next node
zone_node = zone_node.nextSibling
# If the next node's tag name = 'a', break out and go to the next zone
if getattr(zone_node, 'name', None) == 'a':
break
# Process weatherFull however you like
print weatherFull
希望这会有所帮助(或至少在你想要的任何地方!)。