我正在尝试从下面的此URL进行抓取,但没有合并使用浏览器访问时看到的内容(来自公共客户案例/故事的内容)。我也尝试用标题模拟真实的浏览器,但到目前为止还没有。对我有用吗?
URL:https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365
import requests
main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
result = requests.get(main_url)
print(result.text)
答案 0 :(得分:1)
它使用外部API来获取数据。您只需要拨打以下电话即可:
GET https://customers.microsoft.com/en-us/api/search?key=STORY_KEY
STORY_KEY
是767633-asos-retailer-azure-active-directory-m365
,例如网址中最后一个斜杠之后的文本。您可以使用类似以下内容的python脚本:
import requests
url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
r = requests.get(
"https://customers.microsoft.com/en-us/api/search",
params = {
"key": url.rsplit('/', 1)[1]
}
)
document = r.json()["search_document"]
summary = document["story_exec_summary"]
body = document["story_body_text_2"]
quote1 = document["story_quote_carousel"]
quote2 = document["story_quote_carousel_2"]
print(summary)
print(body)
print(quote1)
print(quote2)
请注意,您需要在document
对象(视频,body3等...)中搜索所需的数据
答案 1 :(得分:0)
您需要正确处理证书。它将需要其他软件包:
pip install certifi
pip install urllib3
我们需要使用其他python库,即urllib3
python
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> import certifi
>>> import urllib3
>>>
>>> http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
>>> main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
>>>
>>> r = http.request('GET', main_url)
>>> r.status
200
>>> r.data
>>> open("stories.html", "wb").write(r.data)
输出:
>>> r.data
b'\r\n<!doctype html>\r\n<html lang="en" xml:lang="en" dir="ltr">\r\n<head prefix="og: http://ogp.me/ns#">\r\n <meta charset="utf-8" />\r\n <meta name="viewport" content="width=device-width, initial-scale=1.0" />\r\n <meta name="description" content="Microsoft customer stories. See how Microsoft tools help companies run their business.">\r\n <meta name="keywords" content="Microsoft, customers, stories, business, software, tools, services, use case, global, collaboration, vendor, story sear .....
让我知道这是否有帮助。