无法使用python请求从特定网站上抓取

时间:2020-06-06 21:47:55

标签: python web-scraping python-requests

我正在尝试从下面的此URL进行抓取,但没有合并使用浏览器访问时看到的内容(来自公共客户案例/故事的内容)。我也尝试用标题模拟真实的浏览器,但到目前为止还没有。对我有用吗?

URL:https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365

import requests
main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
result = requests.get(main_url)   
print(result.text)

2 个答案:

答案 0 :(得分:1)

它使用外部API来获取数据。您只需要拨打以下电话即可:

GET https://customers.microsoft.com/en-us/api/search?key=STORY_KEY

STORY_KEY767633-asos-retailer-azure-active-directory-m365,例如网址中最后一个斜杠之后的文本。您可以使用类似以下内容的脚本:

import requests

url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"

r = requests.get(
    "https://customers.microsoft.com/en-us/api/search",
    params = {
        "key": url.rsplit('/', 1)[1]
    }
)
document = r.json()["search_document"]

summary = document["story_exec_summary"]
body = document["story_body_text_2"]
quote1 = document["story_quote_carousel"]
quote2 = document["story_quote_carousel_2"]

print(summary)
print(body)
print(quote1)
print(quote2)

请注意,您需要在document对象(视频,body3等...)中搜索所需的数据

答案 1 :(得分:0)

您需要正确处理证书。它将需要其他软件包:

pip install certifi
pip install urllib3

我们需要使用其他python库,即urllib3

python
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> import certifi
>>> import urllib3
>>>
>>> http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
>>> main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
>>>
>>> r = http.request('GET', main_url)
>>> r.status
200
>>> r.data

>>> open("stories.html", "wb").write(r.data)

输出:

>>> r.data
b'\r\n<!doctype html>\r\n<html lang="en" xml:lang="en" dir="ltr">\r\n<head prefix="og: http://ogp.me/ns#">\r\n    <meta charset="utf-8" />\r\n    <meta name="viewport" content="width=device-width, initial-scale=1.0" />\r\n    <meta name="description" content="Microsoft customer stories. See how Microsoft tools help companies run their business.">\r\n    <meta name="keywords" content="Microsoft, customers, stories, business, software, tools, services, use case, global, collaboration, vendor, story sear .....

让我知道这是否有帮助。