Question

我有两个列表，一个是日期列表，另一个是小时列表，我正在扫描寻找内容的网站。

date = ['1','2','3']
hour = ['1','2','3']

我写了以下while / for来循环显示日期和时间并打开所有组合：

datetotry = 0  
while (datetotry < len(date)):
    for i in range(len(hour)):              
        print ('https://www.website.org/backups/backup_2004-09-'+date[datetotry]+'_'+hour[i]+".sql")
        html = opener.open('https://www.website.org/backups/backup_2004-09-'+date[datetotry]+'_'+hour[i]+'.sql').read()
datetotry += 1

当控制台打印网址时，看起来没问题，变量会被列表中的数字替换掉。

但它可能不会替换实际url请求中的变量。

由于404错误导致代码停止，但我认为我使用我在此处找到的信息处理了该代码：

https://docs.python.org/3/howto/urllib2.html#wrapping-it-up

404错误的第一部分显示了

date[datetotry]+'_'+hour[i]+

部分，而不是列表中的项目，如打印到控制台时。

这是否意味着我必须执行urllib.parse.urlencode之类的操作来实际替换变量？

我导入了文章中提到的库并将代码更改为：

 from urllib.error import URLError, HTTPError
 from urllib.request import Request, urlopen

while (datetotry < len(date)):
for I in range(len(hour)):              
    HTML = Request('https://www.website.org/backups/backup_2004-09-'+date[datetotry]+'_'+hour[i]+'.sql')
    try:
        response = urlopen(html)
    except URLError as e:
        print('The server couldn\'t fulfill the request.')
        print('Error code: ', e.code)
    except URLError as e:
        print('We failed to reach a server.')
        print('Reason: ', e.reason)
    else:

因此代码实际运行，而不是因为返回404而停止。查看它实际请求的内容的最佳方法是什么？我必须进行某种编码吗？编程特别是Python 3的新手。

请求中的Python 3变量导致404？

0 个答案: