我是一个python新手,我在编码和URL方面遇到了麻烦。我的目标是在文本文件中下载URL列表。我的脚本运行良好,但我的错误包含一些包含一些法语口音的网址(如éèà等)。
这是我的代码:
#!/usr/bin/env python
# coding: utf8
import urllib.request
import os
import codecs
import io
# Variables settings
URL = ""
finalFileName = ""
listFiles = "fichiers.txt"
nbLines = 0
currentLine = 1
# Open the file
print ("Open the source file...")
file = open(listFiles, "r")
lines = file.readlines()
# Get line numbers
for line in lines:
nbLines += 1
file.close()
# Download the file
print ("Download the " + str(nbLines) + " files started")
# Read the file line per line
for line in lines :
URL = line.replace("\n", "")
finalFileName= os.path.basename(URL)
print ("Download " + finalFileName + " [" + str(currentLine) + "/" + str(nbLines) + "]")
# Download the file
urllib.request.urlretrieve (URL,finalFileName)
# Incremanting count
currentLine += 1
print ("Done")
接下来我遇到了这个错误:
Download racers-saturewood-300x225.jpg [15/993]
Download _81______r-s-oil-top-finish_363.jpg [16/993]
Download traitement_thermo_traite.jpg [17/993]
Download Blanchiment-du-Douglas-exposé-NORD-150x150.jpg [18/993]
Traceback (most recent call last):
File "D:\Bureau\images-site\dlimage.py", line 39, in <module>
urllib.request.urlretrieve (URL,finalFileName)
File "C:\Python34\lib\urllib\request.py", line 186, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 463, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 481, in _open
'_open', req)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1210, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python34\lib\urllib\request.py", line 1182, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Python34\lib\http\client.py", line 1088, in request
self._send_request(method, url, body, headers)
File "C:\Python34\lib\http\client.py", line 1116, in _send_request
self.putrequest(method, url, **skips)
File "C:\Python34\lib\http\client.py", line 973, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66: ordinal not in range(128)
我尝试了一些没有错误的选项:
URL.encode('utf8')
(拒绝转换caracters,UnicodeEncodeError:'ascii'编解码器不能编码位置65-66中的字符:序数不在范围内(128))
URL.decode()
(不工作)
我迷路了,我不知道如何解决这个问题,你能帮帮我吗?
由于 问候 亚瑟