Question

我正在用漂亮的汤刮这个网站，我需要完全获得产品名称。当我使用h2标签时，最终会得到诸如“ NIVEA柔和保湿霜浆果香...”的名称。

我不希望这些点的末尾只是全名。这是我用于抓取数据的代码段：

div_soup=data_soup.findAll('div',{'class':'product-list-box card desktop-cart'})

table_rows=[]
for div in div_soup:
   current_row=[]
   product_name=div.findAll('h2',{})
   product_price=div.findAll('span',{'class':'post-card__content-price-offer'})
   for idx,data in enumerate(product_name):
       current_row.append(data.text)
   for idx,data in enumerate(product_price):
       current_row.append(data.text)
   table_rows.append(current_row)

我无法确定要使用的适当标签，也无法确定是否应该在字典中传递内容。

我要抓取的网站的网址：https://www.nykaa.com/skin/moisturizers/face-moisturizer-day-cream/c/8394?root=nav_3

Answer 1

for idx,data in enumerate(product_name): if data.get('title') is not None: current_row.append(data['title'])

应该做什么

最好将代码重构为

product_name=div.find('h2', {'title': True).get('title')

因此，您只需要查找具有title属性的h2标签，就可以避免for循环

如何使用漂亮的汤在标题属性中抓取文字？

1 个答案: