如何使用美丽的汤从URL中提取内容时设置会话cookie?

时间:2015-05-26 05:17:51

标签: python session cookies web-scraping beautifulsoup

考虑代码:

from bs4 import BeautifulSoup
from urllib.request import urlopen
content = urlopen('https://example.net/users/101')
soup = BeautifulSoup(content)
divTag = soup.find_all("div", {"class":"classname"})
print(divTag)
for tag in divTag:
   ulTags = tag.find_all("ul", {"class":"classname"})
   for tag in ulTags:
       aTags = tag.find_all("li")
       for tag in aTags:
           name = tag.find('a')['href']
           print(name)

如果我使用,

content = open("try.html","r")

我得到了所需的输出。

此处,只有在输入用户名和密码后才能访问example.net。密码。虽然解析是正确完成的,但上面的代码不会打印任何内容。如何将会话cookie值添加到此代码中?

1 个答案:

答案 0 :(得分:4)

您是否尝试过请求?

可以在会话中保留Cookie。

import requests
s = requests.Session()
s.post('https://example.net/users/101', data = {'username' : 'sup', 'password' : 'pass'})
r = s.get("https://example.net/users/101")
soup = BeautifulSoup(r.text)

有关requests.Session()的更多信息

http://docs.python-requests.org/en/latest/user/advanced/