Question

我正在尝试创建一个循环，以在2014年至2017年之间从网站下载数据。我创建了一个简单的循环，该循环应从此链接下载数据，但适用于2015年，2016年和2017年。唯一需要的文本更改链接的年份是：

https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h 2014 .txt.gz＆dir = data / historical / stdmet /

修订版：

import urllib

core = 'https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h'
year = 2014
end = '.txt.gz&dir=data/historical/stdmet/'

for i in range(0,3):

        year += 1
        year_fixed = str(year)
        urllib.urlretrieve(core+year_fixed+end)

我收到的错误位于第一个网站地址：

AttributeError: module 'urllib' has no attribute 'urlretrieve'

由于某种原因，它没有导入2014年至2017年的任何数据。是否有更好的方法来创建此数据？任何帮助将不胜感激。

Answer 1

使用python3（此处使用3.7）和请求模块，可以简化为：

import requests
for year in range(2014, 2018):
    url = f'https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h{year}.txt.gz&dir=data/historical/stdmet/'
    r = requests.get(url)
    print(r.text)

代替打印，您可以将输出保存到文件中

//为Python <3.6编辑，使用str.format（）

url = "https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h{}.txt.gz&dir=data/historical/stdmet/".format(year)

进一步了解字符串格式：https://realpython.com/python-f-strings/

Answer 2

以下内容在Python 3中运行良好。该循环在检索数据后创建单个文件。

import urllib.request

core = 'https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h'
year = 2014
end = '.txt.gz&dir=data/historical/stdmet/'

for i in range(0,3):

        year += 1
        year_fixed = str(year)
        filename = "text" + str(i) + ".txt"
        urllib.request.urlretrieve(core+year_fixed+end, filename)

创建循环以通过多个URL导入数据

2 个答案: