Question

我正在使用以下代码从循环中的FTP服务器检索数据文件：

   response = urllib.request.urlopen(url)
    data = response.read()
    response.close()
    compressed_file = io.BytesIO(data)
    gin = gzip.GzipFile(fileobj=compressed_file)

检索并处理前几个工作正常，但在几个请求后我收到以下错误：

    530 Maximum number of connections exceeded.

我尝试关闭连接（参见上面的代码）并使用sleep（）计时器，但这两个都不起作用。我在这做错了什么？

Answer 1

试图让urllib正确地做FTP会让我的大脑受伤。默认情况下，它会为每个文件创建一个新连接，显然没有真正确保连接关闭。我认为ftplib更合适。

因为我碰巧正在使用相同的数据，所以...这是一个非常具体的答案，解压缩.gz个文件并将它们传递到ish_parser（https://github.com/haydenth/ish_parser ）。我认为它也足以作为一般答案。

import ftplib
import io
import gzip
import ish_parser # from: https://github.com/haydenth/ish_parser

ftp_host = "ftp.ncdc.noaa.gov"
parser = ish_parser.ish_parser()

# identifies what data to get
USAF_ID = '722950'
WBAN_ID = '23174'
YEARS = range(1975, 1980)

with ftplib.FTP(host=ftp_host) as ftpconn:
    ftpconn.login()

    for year in YEARS:
        ftp_file = "pub/data/noaa/{YEAR}/{USAF}-{WBAN}-{YEAR}.gz".format(USAF=USAF_ID, WBAN=WBAN_ID, YEAR=year)
        print(ftp_file)

        # read the whole file and save it to a BytesIO (stream)
        response = io.BytesIO()
        try:
            ftpconn.retrbinary('RETR '+ftp_file, response.write)
        except ftplib.error_perm as err:
            if str(err).startswith('550 '):
                print('ERROR:', err)
            else:
                raise

        # decompress and parse each line 
        response.seek(0) # jump back to the beginning of the stream
        with gzip.open(response, mode='rb') as gzstream:
            for line in gzstream:
                parser.loads(line.decode('latin-1'))

这确实将整个文件读入内存，这可能是使用一些聪明的包装和/或yield或其他东西来避免的......但是可以正常工作一年的小时天气观察。

Answer 2

可能是一个非常讨厌的解决方法，但这对我有用。我创建了一个执行请求的脚本（此处称为test.py）（参见上面的代码）。下面的代码用在我提到的循环中，并调用test.py

来自子进程导入调用
打开（＆＃39; log.txt＆＃39;，＆＃39; a＆＃39;）作为f：致电（[＆＃39; python＆＃39;，＆＃39; test.py＆＃39;，args [0]，args [1]]，stdout = f）

Python 3 urllib：循环中有530个连接

2 个答案: