为什么使用cookie登录网站在python3中失败?

时间:2018-03-07 09:43:43

标签: python-3.x cookies scrapy-spider

我想使用我必须访问网站的cookie并且只能在用户登录后看到一些信息,但是当我尝试它时,结果显示用户没有登录网站,这是我的代码,有人可以告诉我如何解决问题吗?

   LOGIN_URL ="https://www.yaozh.com/login/"
   values = {'username': 'username', 'pwd': 'password'} # , 'submit' : 'Login'
   postdata = urllib.parse.urlencode(values).encode()
   user_agent = r'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 
   (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
   headers = {'User-Agent': user_agent, 'Connection': 'keep-alive'}

   cookie_filename = 'cookie.txt'
   cookie = http.cookiejar.MozillaCookieJar(cookie_filename)
   handler = urllib.request.HTTPCookieProcessor(cookie)
   opener = urllib.request.build_opener(handler)
   request = urllib.request.Request(LOGIN_URL, postdata, headers)

   try:
    response = opener.open(request)
    page = response.read().decode()
  # print(page)
   except urllib.error.URLError as e:
   print(e.code, ':', e.reason)
   cookie.save(ignore_discard=True, ignore_expires=True)  
   print(cookie)
   for item in cookie:
   print('Name = ' + item.name)
   print('Value = ' + item.value)

   get_url = 'https://db.yaozh.com/instruct?p=1&pageSize=20'  
   get_request = urllib.request.Request(get_url, headers=headers)
   get_response = opener.open(get_request)
   print(get_response.read())
   bs=BeautifulSoup(get_response,"html.parser")
   urls=bs.find_all(name='a',attrs={"href":re.compile("\.doc")},recursive=True)
   print(len(urls))
   for url in urls:
     print(url["href"])

1 个答案:

答案 0 :(得分:0)

问题已经解决,如果你遇到同样的问题,我想你应该把信息发布到服务器上了吗?许多网站需要一些真实用户无法看到的信息来判断请求者是否是真正的用户。祝你好运!