如何在 Python 中发送带有数据的 get 请求?

时间:2021-01-30 18:06:35

标签: python web-scraping python-requests data-mining data-extraction

我想从这个站点获取数据:
https://www.techstars.com/portfolio?category=all%20companies

正如您在网络选项卡中看到的,一个获取请求被发送到这个链接:
https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&offset=0&limit=50

但是当我打开它时,它说“权限被拒绝...”,当我在 Python 中发送 get 请求时也是如此。

如何使用正确的数据向此链接发送获取请求?

这是我的代码。

import requests
url = 'https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&offset=0&limit=50'
payload = {'order':'name','program_status':'in.(session_in_progress,session_over)}', 'type':'eq.Graduate','session':'not.in.(%22%22)','offset':'0','limit':'50'}
r = requests.get(url, data=payload)
r.content

它给了我这个结果

b'{"hint":null,"details":null,"code":"42501","message":"permission denied for table companies"}'

1 个答案:

答案 0 :(得分:1)

您还需要为您的请求提供额外的标头才能工作。例如:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
    'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoicGdyZXN0X3d3dzIifQ.RB9HicmPNEl4C0Ree9SVw3Oh5tinjDiIbBurBujVnEg',
    'Accept' : 'application/json, text/plain, */*',
    'Origin' : 'https://www.techstars.com'
}

url = "https://datacore.techstars.com/companies?order=name&program_status=in.(session_in_progress,session_over)}&type=eq.Graduate&session=not.in.(%22%22)&id=in.(001E000001EZFcYIAX,001E000000I0FdNIAV,001E000000SsjXdIAJ,001E000000HzxB9IAJ,001E000000IyUe7IAF,001E000000HzKYCIA3,001E000000IIItfIAH,001E000000IyUJRIA3)}&offset=0&limit=50"
r = requests.get(url, headers=headers)

for entry in r.json():
    print(f"{entry['name']} - {entry['description']}")

需要一个 Authorizaton 标头。这个值可能在主页的 HTML 中。

这将为您提供如下输出:

Chainalysis - Building the compliance layer for the future of value transfer.
ClassPass - ClassPass is a membership program for fitness classes across multiple gyms and studios, making working out more accessible.
DataRobot - DataRobot brings AI technology and ROI enablement services to global enterprises.
DigitalOcean - The cloud for developers
Outreach - Outreach is a sales engagement platform that accelerates revenue growth by optimizing interactions throughout the customer lifecycle.
Remitly - Remitly is a mobile payments service that enables users to make person-to-person international money transfers.
SendGrid - SendGrid is a cloud-based customer communication platform that drives engagement and business growth.
Zipline - Zipline is creating a highly automated drone network to shuttle blood and pharmaceuticals to remote locations in hours rather than weeks or months.