我有一个requests.RequestCookieJar
对象,其中包含来自不同域/路径的多个Cookie。如何根据here中提到的规则提取特定域/路径的Cookie字符串?
例如
>>> r = requests.get("https://stackoverflow.com")
>>> print(r.cookies)
<RequestsCookieJar[<Cookie prov=4df137f9-848e-01c3-f01b-35ec61022540 for .stackoverflow.com/>]>
# the function I expect
>>> getCookies(r.cookies, "stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
>>> getCookies(r.cookies, "meta.stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
# meta.stackoverflow.com is also satisfied as it is subdomain of .stackoverflow.com
>>> getCookies(r.cookies, "google.com")
""
# r.cookies does not contains any cookie for google.com, so it return empty string
答案 0 :(得分:2)
实际上,当我遇到和你一样的问题时。但是当我访问 Class Define
class RequestsCookieJar(cookielib.CookieJar, MutableMapping):
我找到了一个名为 def get_dict(self, domain=None, path=None):
的函数
你可以简单地写这样的代码
raw = "rawCookide"
print(len(cookie))
mycookie = SimpleCookie()
mycookie.load(raw)
UCookie={}
for key, morsel in mycookie.items():
UCookie[key] = morsel.value
答案 1 :(得分:1)
新答案
好的,所以我仍然没有得到你想要实现的目标。
如果你想从requests.RequestCookieJar
对象中提取原始网址(这样你就可以检查是否与给定的子域匹配),这是(据我所知)不可能的。
但是,你可以做一些类似的事情:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
import re
class getCookies():
def __init__(self, url):
self.cookiejar = requests.get(url).cookies
self.url = url
def check_domain(self, domain):
try:
base_domain = re.compile("(?<=\.).+\..+$").search(domain).group()
except AttributeError:
base_domain = domain
if base_domain in self.url:
print("\"prov=" + str(dict(self.cookiejar)["prov"]) + "\"")
else:
print("No cookies for " + domain + " in this jar!")
然后,如果你这样做:
new_instance = getCookies("https://stackoverflow.com")
然后你可以这样做:
new_instance.check_domain("meta.stackoverflow.com")
哪个会给出输出:
"prov=5d4fda78-d042-2ee9-9a85-f507df184094"
while:
new_instance.check_domain("google.com")
输出:
"No cookies for google.com in this jar!"
然后,如果你(如果需要的话)微调正则表达式&amp;创建一个url列表,你可以首先遍历列表来创建许多实例并将它们保存在例如列表或dict中。在第二个循环中,您可以检查另一个URL列表,以查看其cookie是否可能出现在任何实例中。
OLD ANSWER
您链接的文档解释:
项目()
类似Dict的items(),它返回一个name-value列表 罐子里的元组。允许客户端代码调用 dict(RequestsCookieJar)并获得一个关键值的vanilla python dict 对
我认为你在寻找的是:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
def getCookies(url):
r = requests.get(url)
print("\"prov=" + str(dict(r.cookies)["prov"]) + "\"")
现在我可以像这样运行它:
>>> getCookies("https://stackoverflow.com")
"prov=f7712c78-b489-ee5f-5e8f-93c85ca06475"
答案 2 :(得分:1)
我认为您需要使用Cookie的Python字典。 (见我的评论above。)
def getCookies(cookie_jar, domain):
cookie_dict = cookie_jar.get_dict(domain=domain)
found = ['%s=%s' % (name, value) for (name, value) in cookie_dict.items()]
return ';'.join(found)
你的例子:
>>> r = requests.get("https://stackoverflow.com")
>>> getCookies(r.cookies, ".stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
答案 3 :(得分:0)
下面的代码不保证是“前向兼容的”,因为我正在访问由其作者故意隐藏(这类)的类的属性;但是,如果您必须了解Cookie的属性,请在此处查看:
import http.cookies
import requests
import json
import sys
import os
aresponse = requests.get('https://www.att.com')
requestscookiejar = aresponse.cookies
for cdomain,cooks in requestscookiejar._cookies.items():
for cpath, cookgrp in cooks.items():
for cname,cattribs in cookgrp.items():
print(cattribs.version)
print(cattribs.name)
print(cattribs.value)
print(cattribs.port)
print(cattribs.port_specified)
print(cattribs.domain)
print(cattribs.domain_specified)
print(cattribs.domain_initial_dot)
print(cattribs.path)
print(cattribs.path_specified)
print(cattribs.secure)
print(cattribs.expires)
print(cattribs.discard)
print(cattribs.comment)
print(cattribs.comment_url)
print(cattribs.rfc2109)
print(cattribs._rest)
当一个人需要访问cookie的简单属性时,遵循以下方法可能不太复杂。这样可以避免使用RequestsCookieJar。在这里,我们通过读取响应对象的headers属性而不是cookies属性来构造一个SimpleCookie实例。名称SimpleCookie似乎暗示单个cookie,但这不是简单的cookie。试试看:
import http.cookies
import requests
import json
import sys
import os
def parse_cookies(http_response):
cookie_grp = http.cookies.SimpleCookie()
for h,v in http_response.headers.items():
if 'set-cookie' in h.lower():
for cook in v.split(','):
cookie_grp.load(cook)
return cookie_grp
aresponse = requests.get('https://www.att.com')
cookies = parse_cookies(aresponse)
print(str(cookies))