使用Python请求发出自定义请求

时间:2019-10-06 20:36:44

标签: python python-3.x post request python-requests

这是URL https://www.lowes.com/store/AK-Anchorage/2955,当我们到达该URL时,如果我们单击该按钮,则按钮名称为“ Shop this store”,单击该按钮并使用链接发出的请求是相同的,但仍然单击该按钮将获得另一个页面,然后直接使用链接。我需要发出与按钮相同的请求。

我需要向“ https://www.lowes.com/store/AK-Anchorage/2955”发出请求,然后我需要做出与单击按钮相同的请求。

我试图连续两次发出请求,以获得所需的页面,但是没有运气。

url='https://www.lowes.com/store/AK-Anchorage/2955'
ua = UserAgent()
header = {'User-Agent':str(ua.chrome)}
response = requests.get(url, headers=header)
response = requests.get(url, headers=header)

2 个答案:

答案 0 :(得分:1)

因此,这似乎可行。两次都收到200 OK响应,但内容长度不一样。

关于它的价值,在Firefox中,当我单击蓝色的“购买此商店”按钮时,它将带我到看似完全相同的页面,但没有单击蓝色按钮。在Chrome浏览器(测试版)中,当我单击蓝色按钮时,会显示一个403 Access denied页面。他们的服务器运行不正常。您可能难以实现想要的目标。

如果我在没有标题的情况下呼叫session.get,则根本不会得到响应。因此,他们显然正在检查用户代理,可能是cookie等。

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0",
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.5",
           "Accept-Encoding": "gzip, deflate, br",
           "Upgrade-Insecure-Requests": "1",}

session = requests.Session()

url = "https://www.lowes.com/store/AK-Anchorage/2955"

response1 = session.get(url, headers=headers)
print(response1, len(response1.content))

response2 = session.get(url, headers=headers)
print(response2, len(response2.content))

输出:

<Response [200]> 56282
<Response [200]> 56323

我还做了一些测试。如果您没有更改默认Python Requests的user-agent,则服务器超时。即使将其更改为""似乎也足以使服务器给您响应。

您无需选择特定商店即可获取产品信息,包括描述,规格和价格。看看这个GET请求,它没有cookie,也没有会话:

import requests, json

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}

url = "https://www.lowes.com/pd/Google-Nest-Learning-Thermostat-3rd-Gen-Thermostat-and-Room-Sensor-with-with-Wi-Fi-Compatibility/1001080012"

r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))

for line in r.text.splitlines():
    if "window.digitalData.products = [" in line:
        print("This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.")
        left = line.find(" = ") + 3
        right = line.rfind(";")
        print(json.dumps(json.loads(line[left:right]), indent=True))
        break

输出:

return code: <Response [200]>
content length: 107134
This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.
[
 {
  "productId": [
   "1001080012"
  ],
  "productName": "Nest_Learning_Thermostat_3rd_Gen_Thermostat_and_Room_Sensor_with_with_Wi-Fi_Compatibility",
  "ivm": "753160-83910-T3007ES",
  "itemNumber": "753160",
  "vendorNumber": "83910",
  "modelId": "T3007ES",
  "type": "ANY",
  "brandName": "Google",
  "superCategory": "Heating & Cooling",
  "quantity": 1,
  "sellingPrice": 249,
  "retailPrice": 249
 }
]

产品说明和规格可在此元素中找到:

<section class="pd-information met-product-information grid-100 grid-parent v-spacing-jumbo">

(大约300行,所以我只复制父标记。)​​

有一个API带有产品ID和商店编号,并返回价格信息:

import requests, json

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}

url = "https://www.lowes.com/PricingServices/price/balance?productId=1001080012&storeNumber=1955"

r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))
print(json.dumps(json.loads(r.text), indent=True))

输出:

return code: <Response [200]>
content length: 768
[
 {
  "productId": 1001080012,
  "storeNumber": 1955,
  "isSosVendorDirect": true,
  "price": {
   "selling": "249.00",
   "retail": "249.00",
   "typeCode": 1,
   "typeIndicator": "Regular Price"
  },
  "availability": [
   {
    "availabilityStatus": "Available",
    "productStockType": "STK",
    "availabileQuantity": 822,
    "deliveryMethodId": 1,
    "deliveryMethodName": "Parcel Shipping",
    "storeNumber": 907
   },
   {
    "availabilityStatus": "Available",
    "productStockType": "STK",
    "availabileQuantity": 8,
    "leadTime": 1570529161540,
    "deliveryMethodId": 2,
    "deliveryMethodName": "Store Pickup",
    "storeNumber": 1955
   },
   {
    "availabilityStatus": "Available",
    "productStockType": "STK",
    "availabileQuantity": 1,
    "leadTime": 1570529161540,
    "deliveryMethodId": 3,
    "deliveryMethodName": "Truck Delivery",
    "storeNumber": 1955
   }
  ],
  "@type": "item"
 }
]

它可以包含多个产品编号。例如: https://www.lowes.com/PricingServices/price/balance?productId=1001080046%2C1001135076%2C1001091656%2C1001086418%2C1001143824%2C1001094006%2C1000170557%2C1000920864%2C1000338547%2C1000265699%2C1000561915%2C1000745998&storeNumber=1564


您可以使用此API获得有关每个商店的信息,该API返回一个1.6MB的 json 文件。 maxResults通常设置为30,而query是您的经度和纬度。我建议将其保存到磁盘。我怀疑它会发生很大变化。

https://www.lowes.com/wcs/resources/store/10151/storelocation/v1_0?maxResults=2000&query=0%2C0

请记住,PricingServices/price/balance端点可以为storeNumber取多个值,并用%2C(逗号)分隔,因此您不需要1763个单独的GET请求。我仍然使用requests.Session发出了多个请求(因此它重复使用了基础连接)。

答案 1 :(得分:0)

这取决于您要如何处理数据。在URL中,您已经有商店ID。

点击按钮时,它会向https://www.lowes.com/store/api/2955发送请求以获取商店信息。是您要找的东西吗?

如果是这样,您不需要两个请求,而只需要一个请求就可以获取所需的商店信息。