抓取一些网页我没有获得与在浏览器中检查时相同的来源。在浏览器中查看源时,实际超链接的超链接,显示为{url}
。下面是示例页面的示例代码。
import requests
from bs4 import BeautifulSoup as bs
page = requests.get("https://www.mckinsey.com/search?q=iot")
soup = bs(page.content, 'html.parser')
soup.findAll('div', {'class' : 'item title-link'})
如果在浏览器的最后一行检查汤元素,则它是一个完整的URL。如果在请求版本中检查它只是说{url}
,那么获取汤对象就会变空。
答案 0 :(得分:1)
此门户使用JavaScript
从服务器获取数据并放在页面上。
在Chrome / Firefox中使用DevTool
,您可以看到javaScript
发送带有POST
参数的JSON
请求,并将所有数据作为JSON获取。如果你得到它,那么你将全部作为字典。
import requests
params = {
'q': 'iot',
'page': '1',
'app': '',
'sort': 'default',
'ignoreSpellSuggestion': 'false',
}
url = 'https://www.mckinsey.com/services/ContentAPI/SearchAPI.svc/search'
for page in range(1, 3):
params['page'] = str(page)
r = requests.post(url, json=params)
data = r.json()
print()
print("data['data'].keys():\n", data['data'].keys())
print()
print(' currentPage:', data['data']['currentPage'])
print(' totalPages:', data['data']['totalPages'])
print('totalResults:', data['data']['totalResults'])
print()
print("data['data']['results'][0].keys():\n", data['data']['results'][0].keys())
print()
for item in data['data']['results']:
print(item['title'])
print(item['url'])
print('---')
结果:
data['data'].keys():
dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])
currentPage: 1
totalPages: 17
totalResults: 166
data['data']['results'][0].keys():
dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])
Taking the pulse of enterprise <b>IoT</b>
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/taking-the-pulse-of-enterprise-iot
---
An executive's guide to the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/an-executives-guide-to-the-internet-of-things
---
Internet of Things | Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/how-we-help-clients
---
Unlocking the potential of the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/the-internet-of-things-the-value-of-digitizing-the-physical-world
---
Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights
---
Six ways CEOs can promote cybersecurity in the <b>IoT</b> age
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/six-ways-ceos-can-promote-cybersecurity-in-the-iot-age
---
What's new with the Internet of Things?
https://www.mckinsey.com/industries/semiconductors/our-insights/whats-new-with-the-internet-of-things
---
How can we recognize the real power of the Internet of Things?
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/how-can-we-recognize-the-real-power-of-the-internet-of-things
---
Making sense of Internet of Things platforms
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-sense-of-internet-of-things-platforms
---
Partnerships, scale, and speed: The hallmarks of a successful <b>IoT</b> strategy
https://www.mckinsey.com/industries/financial-services/our-insights/partnerships-scale-and-speed
---
data['data'].keys():
dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])
currentPage: 2
totalPages: 17
totalResults: 166
data['data']['results'][0].keys():
dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_executive_summary.ashx
---
The future of connectivity: Enabling the Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/the-future-of-connectivity-enabling-the-internet-of-things
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/the-internet-of-things-mapping-the-value-beyond-the-hype.ashx
---
Insurers need to plug into the Internet of Things – or risk falling behind
https://www.mckinsey.com/~/media/mckinsey/industries/financial%20services/our%20insights/european%20insurance%20practice%20report%20on%20internet%20of%20things/mckinsey%20-%20insurers%20need%20to%20plug%20into%20the%20internet%20of%20things%20or%20risk%20falling%20behind.ashx
---
Security in the Internet of Things
https://www.mckinsey.com/industries/semiconductors/our-insights/security-in-the-internet-of-things
---
Semiconductors
https://www.mckinsey.com/~/media/mckinsey/industries/semiconductors/our%20insights/mckinsey%20on%20semiconductors%20issue%206%20-%20spring%202017/mck%20on%20semiconductors_issue%206_2017.ashx
---
Internet of Things: Opportunities and challenges for semiconductor companies
https://www.mckinsey.com/industries/semiconductors/our-insights/internet-of-things-opportunities-and-challenges-for-semiconductor-companies
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_full_report.ashx
---
A new Internet of Things platform and business | Digital McKinsey
https://www.mckinsey.com/business-functions/digital-mckinsey/how-we-help-clients/a-new-internet-of-things-platform-and-business
---
Video meets the Internet of Things
https://www.mckinsey.com/industries/high-tech/our-insights/video-meets-the-internet-of-things
---