使用请求获取电话号码时出现问题

时间:2019-03-27 15:15:03

标签: python python-3.x web-scraping

我已经在python中创建了一个脚本,以从网页中获取连接到显示为Phone Us的javascript链接的电话号码。我知道我可以使用selenium来单击该链接,然后等待直到可见该数字为止才能对其进行解析,但是我对走那条路线不感兴趣。

Main link

enter image description here

但是,当我在打开chrome dev工具后手动单击该链接以查看xhr标签中的网络活动时,我可以找到此链接https://www.cv-library.co.uk/account-contact-details?id=192205以及以下标头,这些标头会产生一些包含电话的json响应我要的号码。

从chrome开发工具中提取的标题:

:authority: www.cv-library.co.uk
:method: GET
:path: /account-contact-details?id=192205
:scheme: https
accept: application/json, text/javascript, */*; q=0.01
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,bn;q=0.8
cookie: job_search_bar_variant=variant_C_labels_above; _ga=GA1.3.807796815.1553681717; _gid=GA1.3.728310157.1553681717; _gcl_au=1.1.1379982900.1553681717; _fbp=fb.2.1553681722126.942064476; tempbasket=1553681845451186016; ui_hidecookienotice=1; session=1553697454.46289%3ABQkDAAAAAA%3D%3D%3A375400f1f62664342b2c0bd1e6bcd9c89170768b; _gat_UA-23741307-1=1
referer: https://www.cv-library.co.uk/list-jobs/276692/Allen-York-Built-and-Natural-Environment-Ltd
user-agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36
x-requested-with: XMLHttpRequest

常规部分如下:

Request URL: https://www.cv-library.co.uk/account-contact-details?id=192205
Request Method: GET
Status Code: 200 
Remote Address: 109.169.5.15:443
Referrer Policy: no-referrer-when-downgrade

它在浏览器中产生的响应:

{email: "", telephone: "01202 888986"}

我已经尝试过了:

import requests

url = "https://www.cv-library.co.uk/account-contact-details?id=192205"

headers = {
    'referer': 'https://www.cv-library.co.uk/list-jobs/276692/Allen-York-Built-and-Natural-Environment-Ltd',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}

with requests.Session() as s:
    res = s.get(url,headers=headers).json()
    print(res)

它引发的错误:

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

如何使用请求获取该电话号码?

1 个答案:

答案 0 :(得分:2)

尝试在标题中添加'x-requested-with':'XMLHttpRequest'

import requests

url = "https://www.cv-library.co.uk/account-contact-details"

headers = {
    'x-requested-with': 'XMLHttpRequest',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

payload = {'id':'192205'}

with requests.Session() as s:
    res = s.get(url,headers=headers, params=payload).json()
    print(res)

输出:

print (res['telephone'])
01202 888986