使用python抓取电子商务网站

时间:2020-01-09 12:35:30

标签: python python-requests

当您发送任何产品链接(来自myntra,amazon,flipkart)时,我正在创建电报bot。每当价格下降时,它将向用户发送消息,这是我的代码,用于从flipkart和myntra取消价格

import requests
from bs4 import BeautifulSoup


URL = 'https://www.myntra.com/sports-sandals/roadster/roadster-men-charcoal-grey-sports- 
sandals/9024251/buy'

head = {"user_agents":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
  Gecko) Chrome/79.0.3945.88 Safari/537.36'}

page = requests.get(URL, headers=head)

soup = BeautifulSoup(page.content, "html.parser")

name = str(BeautifulSoup(page.content, 'html.parser')).split(".")
test_name = BeautifulSoup(page.content, 'html.parser').get_text()

if "flixcart" in name:
    title = soup.find(class_={"_35KyD6"}).get_text()
    price = soup.find(class_={"_1vC4OE _3qQ9m1"}).get_text()
    print(title)
    print(price)

if "myntra" in name:
    price = soup.find(class_={"pdp-price"})
    name = soup.find(class_={"pdp-name"})
    #title = soup.find("div class=\"pdp-price-info\"")
    print(price)

此处代码可以从flipkart中提取价格和名称,但适用于myntra 在“价格”和“名称”中,没有显示任何类型 我想获取图片中突出显示的名称

I want to get the name as highlighted in image

1 个答案:

答案 0 :(得分:2)

使用Javascript中的JSON动态填充页面数据。但是JSON不会通过XHR加载。您可以在{{1}中找到JSON,并可以使用HTML提取JSON并将Regex转换为JSON

Dictionary

输出:

import re
import json
import requests

url = 'https://www.myntra.com/sports-sandals/roadster/roadster-men-charcoal-grey-sports-sandals/9024251/buy'
headers = {"user_agents":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}

response = requests.get(url, headers=headers)

match = re.findall(r"<script>window.__myx = (.+?)</script>", response.text)

json_data = json.loads(match[0])

product_name = json_data['pdpData']['name']
mrp = json_data['pdpData']['price']['mrp']
selling_price = json_data['pdpData']['price']['discounted']

print('ProductName:', product_name)
print('MRP:', mrp)
print('SellingPrice:', selling_price)