将cookie数据添加到requests.urlretrieve中

时间:2018-01-02 13:52:27

标签: python cookies python-requests urlretrieve

我正在尝试从受密码保护的网站下载.torrent文件。 我已经设法使用像这样的cookie进入网站:

cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
           '__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
    try:
        # read site content
        read = requests.get(s_string, cookies=cookies).content
    except RequestException as e:
        raise print('Could not connect to somesite: %s' % e)

    soup = BeautifulSoup(read, 'html.parser')

通过以上代码,我可以访问该网站并抓取我需要的数据。使用抓取的数据,我建立一个.torrent文件的链接,然后我想下载,但这是我被卡住的地方。

以下是我现在正在尝试的内容:( cookie数据显然不是很明显,就像上面代码中没有那样)

cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
               '__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}

# construct download URL
torrent_url = ('https://www.somesite.com/' + torrent_url)
# for testing purposes DELETE!
print('torrent link:', torrent_url)

# download torrent file into a folder
filename = torrent_url.split('/')[-1]
save_as = 'torrents/' + filename + '.torrent'

try:
    r = request.urlretrieve(torrent_url, save_as, data=cookies)
    print("Download successful for: " + filename)
except request.URLError as e:
        raise print("Error :%s" % e)

此代码在普通网站上没有cookie的情况下可以正常工作,但是我试图获取的.torrent文件是在passworded / captchaed网站后面,所以我需要使用cookie来抓它。

所以问题是,我在这里做错了什么?没有data=cookies我得到http 404 errordata=cookies我收到以下错误:

File "/usr/lib/python3.6/http/client.py", line 1064, in _send_output
+ b'\r\n'
TypeError: can't concat str to bytes </error>

PS。在有人询问之前,是的我100%确定torrent_url是正确的,我打印并手动将其复制/粘贴到我自己的浏览器中,提示有问题的.torrent文件的下载窗口

编辑:

try:
   read = requests.session().get(torrent_url)
   with open(save_as, 'wb') as w:
       for chunk in read.iter_content(chunk_size=1024):
           if chunk:
               w.write(chunk)
           w.close()
           print("Download successful for: " + filename)
 except request.URLError as e:
     print("Error :%s" % e)

基于furas的建议做了这个,它现在有效,但是当我尝试打开.torrent时,torrent客户端说&#34;编码无效,无法打开&#34;。

当我打开.torrent文件时,里面是这样的:

<h1>Not Found</h1>
<p>Sorry pal :(</p>
<script src="/cdn-cgi/apps/head/o1wasdM-xsd3-9gm7FQY.js"></script>

我是否仍然做错了或者这与网站所有者有什么关系阻止程序从他的网站下载.torrents或者那种性质的东西?

1 个答案:

答案 0 :(得分:1)

这有效,但我认为并不理想。

cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
           '__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}

try:
    read = requests.get(torrent_url, cookies=cookies)
    with open(save_as, 'wb') as w:
        for chunk in read.iter_content(chunk_size=512):
            if chunk:
                w.write(chunk)
            print(filename + ' downloaded successfully!!!')
except request.URLError as e:
    print("Error :%s" % e)