import requests
from bs4 import BeautifulSoup
import json
import re
url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.searchlist.search.go.57446b5079XMO8"
page = requests.get(url)
print(page.status_code)
print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())
alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj = json.loads(alpha[1].text)
for item in jsonObj['itemListElement']:
name = item['name']
price = item['offers']['price']
currency = item['offers']['priceCurrency']
availability = item['offers']['availability'].split('/')[-1]
availability = [s for s in re.split("([A-Z][^A-Z]*)", availability) if s]
availability = ' '.join(availability)
url = item['url']
print('Availability: %s Price: %0.2f %s Name: %s' %(availability,float(price), currency,name, url))
outfile = open('products.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["name", "type", "price", "priceCurrency", "availability" ])
alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj = json.loads(alpha[1].text)
for item in jsonObj['itemListElement']:
name = item['name']
type = item['@type']
url = item['url']
price = item['offers']['price']
currency = item['offers']['priceCurrency']
availability = item['offers']['availability'].split('/')[-1]
writer.writerow([name, type, price, currency, availability, URL ])
outfile.close()
答案 0 :(得分:1)
首先,您不在其中包含标题。没什么大不了的,只是第一行的url
列中的标题为空白。因此,包括:
writer.writerow(["name", "type", "price", "priceCurrency", "availability", "url" ])
第二,将字符串存储为url
,然后在编写器中引用URL
。 URL
没有任何价值。实际上,它应该给出URL is not defined
或类似的错误。
并且由于您已经在代码中使用url
使用了url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.searchlist.search.go.57446b5079XMO8"
,所以我也可能会将变量名更改为url_text
。
我可能还会使用变量type_text
或type
以外的其他变量,因为type
是python中的内置函数。
但是您需要更改为:
writer.writerow([name, type, price, currency, availability, url ])
outfile.close()
完整代码:
import requests
from bs4 import BeautifulSoup
import json
import csv
url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.searchlist.search.go.57446b5079XMO8"
page = requests.get(url)
print(page.status_code)
print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())
alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj = json.loads(alpha[1].text)
outfile = open('c:\products.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["name", "type", "price", "priceCurrency", "availability" , "url"])
for item in jsonObj['itemListElement']:
name = item['name']
type_text = item['@type']
url_text = item['url']
price = item['offers']['price']
currency = item['offers']['priceCurrency']
availability = item['offers']['availability'].split('/')[-1]
writer.writerow([name, type_text, price, currency, availability, url_text ])
outfile.close()
答案 1 :(得分:0)
我唯一发现错误的地方是,您在最后一行输入错字-大写URL
而不是小写url
。对其进行更改可以使脚本完美运行。