我无法获得页面上的所有链接

时间:2017-01-21 12:56:12

标签: python cookies web-scraping urllib user-agent

我使用beautifulsoup和urllib来提取网页,我已经设置了用户代理和cookie,但是我无法从网页上收到所有链接... 继承我的代码:

import bs4 as bs
import urllib.request
import requests
#sauce = urllib.request.urlopen('https://github.com/search?q=javascript&type=Code&utf8=%E2%9C%93').read()
#soup = bs.BeautifulSoup(sauce,'lxml')

'''
session = requests.Session()
response = session.get(url)
print(session.cookies.get_dict())
'''

url = 'https://github.com/search?q=javascript&type=Code&utf8=%E2%9C%93'


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
           'Cookie' : '_gh_sess=eyJzZXNzaW9uX2lkIjoiMDNhMGI2NjQxZjY4Mjc1YmQ3ZjAyNmJiODM2YzIzMTUiLCJfY3NyZl90b2tlbiI6IlJJOUtrd3E3WVFOYldVUzkwdmUxZ0Z4MHZLN3M2eE83SzhIdVJTUFVsVVU9In0%3D--4485d36d4c86aec01cde254e34db68005193546e 
           logged_in: no'}
response = requests.get(url,headers=headers)

print(response.cookies)

soup = bs.BeautifulSoup(response.content,'lxml')
for url in soup.find_all('a'):
    print(url.get('href'))

有什么我想念的吗?在浏览器中我获得了所有代码的链接,而在脚本中我只获得了一些链接,没有代码...

The webpage opening perfectly in the browser...

0 个答案:

没有答案