获取具有温度值的文本文件

时间:2014-03-03 06:50:16

标签: python datetime python-2.7

到最后,我应该有一个.txt文件,其中包含2009年每天的温度值。问题是,此代码创建的文件只给出了12个值(每月一个)和其中一半来自不真实的日期(即4月31日)。

我不熟悉Python,但我已多次查看我的教科书和代码,但没有发现任何差异。

import urllib2
from bs4 import BeautifulSoup

#CSV
f = open('wunder-data.txt', 'w')

#months, days
for m in range(1, 13):
    for d in range(1, 32):

     #get if already gone through month
     if (m == 2 and d > 28):
       break
     elif (m in [4, 6, 9, 11] and d > 30):
       break

     #open wunderground.com url
     timestamp = '2009' + str(m) + str(d)
     print "Getting data for " + timestamp
     url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
     page = urllib2.urlopen(url)

    #get temp from page
    soup = BeautifulSoup(page)
    #dayTemp = soup.body.nobr.b.string
    dayTemp = soup.findAll(attrs={"class":"nobr"})[4].span.string

    #Format month for timestamp
    if len(str(m)) < 2:
     mStamp = '0' + str(m)
    else:
     mStamp = str(m)

    #Format day for timestamp
    if len(str(d)) < 2:
     dStamp = '0' + str(d)
    else:
     dStamp = str(d)

    #Build timestamp
    timestamp = '2009' + mStamp + dStamp

    #Write timestamp and temperature to file
    f.write(timestamp + ',' + dayTemp + '\n')

# Done getting data! Close file.
f.close()

1 个答案:

答案 0 :(得分:0)

您的代码缩进有问题。从#get if already..page = urllib2.urlopen(url)的部分代码缩进更多,因此它只是内循环的一部分。解析网页内容并写入文件在外循环中。这就是为什么你只抓住几个月的最后一天(实际上大多数都是无效的,因为你的循环被定义为每月31个)。

您可以使用datetime正确地迭代一年中的几天,即:

d = datetime.datetime(2009, 1, 1)
end_date = datetime.datetime(2010, 1, 1)
delta = datetime.timedelta(days=1)
while d < end_date:
    print "Getting data for " + d.strftime("%Y-%m-%d")
    url = "http://www.wunderground.com/history/airport/KBUF/2009/%d/%d/DailyHistory.html" % (d.day, d.month)
    page = urllib2.urlopen(url)

    #process web content and write to file

    d += delta

# Done getting data! Close file.
f.close()