无法使用URL和Urllib将'bytes'对象转换为str

时间:2015-09-08 11:54:21

标签: python encoding byte urllib

我是一个python新手,我在编码和URL方面遇到了麻烦。我的目标是在文本文件中下载URL列表。我的脚本运行良好,但我的错误包含一些包含一些法语口音的网址(如éèà等)。

这是我的代码:

#!/usr/bin/env python
# coding: utf8

import urllib.request
import os
import codecs
import io

# Variables settings

URL = ""
finalFileName = ""
listFiles = "fichiers.txt"
nbLines = 0
currentLine = 1

# Open the file

print ("Open the source file...")

file = open(listFiles, "r")
lines = file.readlines()
# Get line numbers
for line in lines:
    nbLines += 1
file.close()

# Download the file

print ("Download the " + str(nbLines) + " files started")

# Read the file line per line
for line in lines :

    URL = line.replace("\n", "")
    finalFileName= os.path.basename(URL)
    print ("Download " + finalFileName + " [" + str(currentLine) + "/" + str(nbLines) + "]")
    # Download the file
    urllib.request.urlretrieve (URL,finalFileName)
    # Incremanting count
    currentLine += 1

print ("Done")

接下来我遇到了这个错误:

Download racers-saturewood-300x225.jpg [15/993]
Download _81______r-s-oil-top-finish_363.jpg [16/993]
Download traitement_thermo_traite.jpg [17/993]
Download Blanchiment-du-Douglas-exposé-NORD-150x150.jpg [18/993]
Traceback (most recent call last):
  File "D:\Bureau\images-site\dlimage.py", line 39, in <module>
    urllib.request.urlretrieve (URL,finalFileName)
  File "C:\Python34\lib\urllib\request.py", line 186, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 463, in open
    response = self._open(req, data)
  File "C:\Python34\lib\urllib\request.py", line 481, in _open
    '_open', req)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 1210, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python34\lib\urllib\request.py", line 1182, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Python34\lib\http\client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "C:\Python34\lib\http\client.py", line 1116, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Python34\lib\http\client.py", line 973, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66: ordinal not in range(128)

我尝试了一些没有错误的选项:

URL.encode('utf8')(拒绝转换caracters,UnicodeEncodeError:'ascii'编解码器不能编码位置65-66中的字符:序数不在范围内(128)) URL.decode()(不工作)

我迷路了,我不知道如何解决这个问题,你能帮帮我吗?

由于 问候 亚瑟

0 个答案:

没有答案