Question

我是python的新手，并且正在尝试编写脚本来下载csv文件。我正在使用python 3.6.1。这是代码

from urllib import request

demo_csv_url = 'http://www.sample-videos.com/csv/Sample-Spreadsheet-100-rows.csv'

def downloadCSV(url):
    response = request.urlopen(url)
    csv = response.read()
    csvStr = str(csv)
    lines = csvStr.split('\\n')
    dest = r'csv.csv'
    fx = open(dest,"w")
    for line in lines:
        fx.write(line + '\n')
    fx.close()


downloadCSV(demo_csv_url)

当我运行脚本时，我收到以下错误

Traceback (most recent call last):
  File "C:\Users\Vivek\Desktop\py tutorials\download_csv.py", line 23, in <module>
    downloadCSV(demo_csv_url)
  File "C:\Users\Vivek\Desktop\py tutorials\download_csv.py", line 12, in downloadCSV
    response = request.urlopen(url)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "D:\softwares\installed softwares\python\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

我尝试添加更多标题，例如

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

然后打开网址为 response = request.urlopen(url,hdr) 但它会引发更多错误。能不能让我知道我在这里做错了什么。感谢

Answer 1

当您直接在浏览器中访问该URL时，该URL会抛出403，因此它似乎正在按预期工作。如果你想抓住403，请使用try / except。

如果内容受Auth标头或Cookie保护，您需要确定这些内容并将其添加到请求中。

Answer 2

您需要进行身份验证才能访问此数据，您需要提供＆＃34;密码＆＃34;，＆＃34;用户名＆＃34;某种。

Python urllib.error.HTTPError：HTTP错误403：禁止

2 个答案: