我想从下面的Div中抓取产品信息,但是当我修饰HTML时,找不到HTML中的主要DIV。
我要获取的元素在以下脚本中。我需要知道如何从下面的脚本中提取数据:
我的代码如下:
导入请求
从bs4导入BeautifulSoup
url =“ https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.searchlist.search.go.57446b5079XMO8”
页面= requests.get(URL)
打印(page.status_code)
打印(page.text)
汤= BeautifulSoup(page.text,'lxml')
打印(soup.prettify())
1 个答案:
答案 0 :(得分:1)
只需使用.find()
或find_all()
当我这样做时,我看到它实际上是json格式,因此可以读取该元素并以这种方式存储所有数据。
import requests
from bs4 import BeautifulSoup
import json
import re
url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.searchlist.search.go.57446b5079XMO8"
page = requests.get(url)
print(page.status_code)
print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())
alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj = json.loads(alpha[1].text)
for item in jsonObj['itemListElement']:
name = item['name']
price = item['offers']['price']
currency = item['offers']['priceCurrency']
availability = item['offers']['availability'].split('/')[-1]
availability = [s for s in re.split("([A-Z][^A-Z]*)", availability) if s]
availability = ' '.join(availability)
url = item['url']
print('Availability: %s Price: %0.2f %s Name: %s' %(availability,float(price), currency,name))
输出:
Availability: In Stock Price: 82199.00 Rs. Name: DELL INSPIRON 15 5570 - 15.6"HD - CI5 - 8THGEN - 4GB - 1TB HDD - AMD RADEON 530 2GB GDDR5.
Availability: In Stock Price: 94599.00 Rs. Name: DELL INSPIRON 15 3576 - 15.6"HD - CI7 - 8THGEN - 4GB - 1TB HRD - AMD Radeon 520 with 2GB GDDR5.
Availability: In Stock Price: 106399.00 Rs. Name: DELL INSPIRON 15 5570 - 15.6"HD - CI7 - 8THGEN - 8GB - 2TB HRD - AMD RADEON 530 2GB GDDR5.
Availability: In Stock Price: 17000.00 Rs. Name: Dell Latitude E6420 14-inch Notebook 2.50 GHz Intel Core i5 4GB 320GB Laptop
Availability: In Stock Price: 20999.00 Rs. Name: Dell Core i5 6410 8GB Ram Wi-Fi Windows 10 Installed ( Refurb )
Availability: In Stock Price: 18500.00 Rs. Name: Core i-5 Laptop Dell 4GB Ram 15.6 " Display Windows 10 DVD+Rw ( Refurb )
Availability: In Stock Price: 8500.00 Rs. Name: Laptop Dell D620 Core 2 Duo 80_2Gb (Used)
...
编辑:要查看2个json结构的区别:
jsonObj_0 = json.loads(alpha[0].text)
jsonObj_1 = json.loads(alpha[1].text)
print(json.dumps(jsonObj_0, indent=4, sort_keys=True))
print(json.dumps(jsonObj_1, indent=4, sort_keys=True))