为什么我可以访问包含公共文件的https网页,但我无法使用Python脚本下载它们?

时间:2017-03-04 22:59:39

标签: python https

我的用户名和密码应该是什么?

import requests
import shutil

url = "https://www.sec.gov/Archives/edgar/daily-index/2017/QTR1/company.20170111.idx.txt"    

#Note: It's https

r = requests.get(url, auth=('', ''), verify=False,stream=True)

r.raw.decode_content = True

with open("company.20170111.idx.txt", 'wb') as f:
    shutil.copyfileobj(r.raw, f) 

2 个答案:

答案 0 :(得分:0)

您尝试加载的网址应为:

https://www.sec.gov/Archives/edgar/daily-index/2017/QTR1/company.20170103.idx

您还缺少import requests,服务器端也不喜欢auth参数。

import shutil
import requests

url = "https://www.sec.gov/Archives/edgar/daily-index/2017/QTR1/company.20170111.idx"

r = requests.get(url, verify=False, stream=True)

r.raw.decode_content = True

with open("company.20170111.idx.txt", 'wb') as f:
    shutil.copyfileobj(r.raw, f)

答案 1 :(得分:0)

这很好用。我不知道为什么:

import urllib2
url = "https://www.sec.gov/Archives/edgar/daily-index/2017/QTR1/company.20170111.idx"
r = urllib2.urlopen(url)
for l in r:
    print l