Python / Json-如何从禁止的json网址报废

时间:2018-07-31 01:25:55

标签: python json selenium

下面的链接包含我需要抓取的数据:https://jobsearch.svc.dhigroupinc.com/v1/efc/jobs/search?page=1&facets= *&countryCode2 = SG&pageSize = 10&currencyCode = SGD

通过预览,我可以看到有可用数据但被隐藏了。单击链接查看预览图像。 Preview of data

但是,它仅显示: {“消息”:“禁止”}

反正我可以像下面那样检索所需的json数据吗?

{"data":[{"id":"307ocL4mnUnNJT5V","title":"KYC Analyst","jobLocation":{"city":"Singapore",...........

如果需要,这里是网络标头的数据。

1) Data for network-headers

2) Data for network-headers

我已经使用selenium来检索我想要的数据,但是如果我可以检索json数据,我可以跳过使用selenium,而只使用简单的请求。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

The only thing you seem to be missing is the api key. I'm not sure how often (if at all) it changes but I seem to be able to make the correct call simply by adding the x-api-key to the header.

import json

import requests

base_url = 'https://jobsearch.svc.dhigroupinc.com/v1/efc/jobs/search'
params = {
    'page': 1,
    'facets': '*',
    'countryCode2': 'SG',
    'pageSize': 10,
    'currencyCode': 'SGD',
}
headers = {
    'x-api-key': 'zvDFWwKGZ07cpXWV37lpO5MTEzXbHgyL4rKXb39C'
}

r = requests.get(base_url, headers=headers, params=params)
r.raise_for_status()

# json.dumps only for pretty printing, r.json() is all you need
print(json.dumps(r.json(), indent=2))

Output: