使用Python Webscraping Environment Canada时的IOError

时间:2016-05-24 15:51:20

标签: python sockets ftp web-scraping

对于工作,我试图从加拿大环境部网页获取批量数据,其实际上有自己的说明:ftp://ftp.tor.ec.gc.ca/Pub/Get_More_Data_Plus_de_donnees/Readme.txt当我运行我的代码时,我总是得到错误10054;远程主机强制关闭的现有连接。作为一个相当新手的程序员,我想知道该网站是否不喜欢我的程序(我在省政府网站的早期阶段测试程序,它似乎检索信息)或我的代码中是否有特定错误阻止我正确连接。欢迎任何建议如何进行。感谢

这是我的代码;最后一个try / except块是我尝试在获取IOError消息后重试连接:

import math
import datetime
import sys
import os
import urllib

# out_folder is relative to local directory
# station id is arbitrary; figure this out from the Web site
#   by inspecting the URL of the stations Web page
[station, start_year, end_year, out_folder] = sys.argv[1:5]

print "retrieving data for station "+station+" for years "+start_year+" to "+end_year+" and saving in folder ./"+out_folder+"\n"

# generate filenames and download them
for year in range(int(start_year), int(end_year)+1):
    for month in range(1, 2):

        url = "http://climate.weather.gc.ca/climateData/bulkdata_e.html?format=csv&stationID="+str(station)+"&Year="+str(year)+"&Month="+str(month+1)+"&Day=1&timeframe=2&submit=Download+Data"
        filename = 'stn_'+str(station)+'_'+str(year)+'.csv'
        print 'stn_'+str(station)+'_'+str(year)+'.csv'
        try:
            print "Trying to retrieve data; please hold"
            urllib.urlretrieve(url, out_folder+'\\'+filename)
        except IOError:
            os.mkdir(out_folder)
            print "folder "+out_folder+" does not exist yet, creating it ...\n"
            try:
                print "Trying to retrieve data; please hold"
                urllib.urlretrieve(url, out_folder+'\\'+filename)
            except IOError:
                print "Trying to retrieve data; please hold"
                urllib.urlretrieve(url, out_folder+'\\'+filename)

exit()

此外,如果它有助于回溯的最后一行:

File "C:\Python27\lib\socket.py", line 476 in readline

data = self._sock.recu(self._rbufsize)

IOError: [Errno socket error] [Errno 10054] An existing connection was forcibly...

1 个答案:

答案 0 :(得分:1)

对不必要的询问表示歉意;问题与我自己的网络阻止连接有关,我切换到Wifi并且程序运行。