Python webscrape无法解析某些超链接

时间:2017-12-11 12:02:02

标签: python web-scraping python-requests

抓取一些网页我没有获得与在浏览器中检查时相同的来源。在浏览器中查看源时,实际超链接的超链接,显示为{url}。下面是示例页面的示例代码。

import requests
from bs4 import BeautifulSoup as bs
page = requests.get("https://www.mckinsey.com/search?q=iot")
soup = bs(page.content, 'html.parser')
soup.findAll('div', {'class' : 'item title-link'})

如果在浏览器的最后一行检查汤元素,则它是一个完整的URL。如果在请求版本中检查它只是说{url},那么获取汤对象就会变空。

1 个答案:

答案 0 :(得分:1)

此门户使用JavaScript从服务器获取数据并放在页面上。

在Chrome / Firefox中使用DevTool,您可以看到javaScript发送带有POST参数的JSON请求,并将所有数据作为JSON获取。如果你得到它,那么你将全部作为字典。

import requests

params = {
    'q': 'iot',
    'page': '1',
    'app': '',
    'sort': 'default',
    'ignoreSpellSuggestion': 'false',
}

url = 'https://www.mckinsey.com/services/ContentAPI/SearchAPI.svc/search'

for page in range(1, 3):

    params['page'] = str(page)

    r = requests.post(url, json=params)

    data = r.json() 

    print()
    print("data['data'].keys():\n", data['data'].keys())
    print()      
    print(' currentPage:', data['data']['currentPage'])
    print('  totalPages:', data['data']['totalPages'])
    print('totalResults:', data['data']['totalResults'])
    print()

    print("data['data']['results'][0].keys():\n", data['data']['results'][0].keys())
    print()

    for item in data['data']['results']:
        print(item['title'])
        print(item['url'])
        print('---')

结果:

data['data'].keys():
 dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])

 currentPage: 1
  totalPages: 17
totalResults: 166

data['data']['results'][0].keys():
 dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])

Taking the pulse of enterprise <b>IoT</b>
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/taking-the-pulse-of-enterprise-iot
---
An executive&#39;s guide to the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/an-executives-guide-to-the-internet-of-things
---
Internet of Things | Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/how-we-help-clients
---
Unlocking the potential of the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/the-internet-of-things-the-value-of-digitizing-the-physical-world
---
Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights
---
Six ways CEOs can promote cybersecurity in the <b>IoT</b> age
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/six-ways-ceos-can-promote-cybersecurity-in-the-iot-age
---
What&#39;s new with the Internet of Things?
https://www.mckinsey.com/industries/semiconductors/our-insights/whats-new-with-the-internet-of-things
---
How can we recognize the real power of the Internet of Things?
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/how-can-we-recognize-the-real-power-of-the-internet-of-things
---
Making sense of Internet of Things platforms
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-sense-of-internet-of-things-platforms
---
Partnerships, scale, and speed: The hallmarks of a successful <b>IoT</b> strategy
https://www.mckinsey.com/industries/financial-services/our-insights/partnerships-scale-and-speed
---

data['data'].keys():
 dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])

 currentPage: 2
  totalPages: 17
totalResults: 166

data['data']['results'][0].keys():
 dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])

THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_executive_summary.ashx
---
The future of connectivity: Enabling the Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/the-future-of-connectivity-enabling-the-internet-of-things
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/the-internet-of-things-mapping-the-value-beyond-the-hype.ashx
---
Insurers need to plug into the Internet of Things – or risk falling behind
https://www.mckinsey.com/~/media/mckinsey/industries/financial%20services/our%20insights/european%20insurance%20practice%20report%20on%20internet%20of%20things/mckinsey%20-%20insurers%20need%20to%20plug%20into%20the%20internet%20of%20things%20or%20risk%20falling%20behind.ashx
---
Security in the Internet of Things
https://www.mckinsey.com/industries/semiconductors/our-insights/security-in-the-internet-of-things
---
Semiconductors
https://www.mckinsey.com/~/media/mckinsey/industries/semiconductors/our%20insights/mckinsey%20on%20semiconductors%20issue%206%20-%20spring%202017/mck%20on%20semiconductors_issue%206_2017.ashx
---
Internet of Things: Opportunities and challenges for semiconductor companies
https://www.mckinsey.com/industries/semiconductors/our-insights/internet-of-things-opportunities-and-challenges-for-semiconductor-companies
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_full_report.ashx
---
A new Internet of Things platform and business | Digital McKinsey
https://www.mckinsey.com/business-functions/digital-mckinsey/how-we-help-clients/a-new-internet-of-things-platform-and-business
---
Video meets the Internet of Things
https://www.mckinsey.com/industries/high-tech/our-insights/video-meets-the-internet-of-things
---