使用beautifulsoup在div标签中获取alt值

时间:2019-06-02 13:08:12

标签: selenium web-scraping beautifulsoup

我正在尝试从此website的html下面获取值“ 4”。这只是产品列表页面中的值之一。我希望列表形式的多个值可以将其放入数据框。

<div class="review-stars-on-hover">
<divclass="product-rating">
<divclass="product-rating__meter"alt="4">
<divclass="product-rating__meter-btm">★★★★★</div>
<divclass="product-rating__meter-top"style="width:80%;">★★★★★</div>
</div>
<divclass="product-rating__countedf-font-size--xsmallnsg-text--medium-grey"alt="95">(95)</div>
</div>
</div>...

我尝试过:

items = soup.select('.grid-item-content')
star = [item.find('div', {'class': 'review-stars-on-hover'}).get('alt') for item in items]

输出(页面中共有16个产品,但没有显示):

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

有什么建议吗?

3 个答案:

答案 0 :(得分:1)

您可以通过仅选择父类中内部类的第一个匹配项来进行选择

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3')
soup = bs(r.content, 'lxml')
stars = [item.select_one('.product-rating__meter')['alt']  for item in soup.select('.grid-item-box:has(.product-rating__meter)')]

答案 1 :(得分:1)

尝试下面的代码。但是,它根据您提到的类返回16条记录,但对于类product-rating__meter只有11条记录。我提供了product-rating__meter类是否可用然后检查的信息alt值。 希望这会有所帮助。

from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
for item in soup.find_all('div',class_='grid-item-content'):
    if item.find('div',class_='product-rating__meter'):
        print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])

输出


Total element count : 16

Alt value : 4
Alt value : 4.3
Alt value : 4.6
Alt value : 4.8
Alt value : 4.4
Alt value : 4.7
Alt value : 4.7
Alt value : 3.8
Alt value : 4.5
Alt value : 3.3
Alt value : 4.5

已编辑

from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
itemlist=[]
for item in soup.find_all('div',class_='grid-item-content'):
    if item.find('div',class_='product-rating__meter'):
        #print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
        itemlist.append("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
print(itemlist)

输出:

Total element count : 16
['Alt value : 4', 'Alt value : 4.3', 'Alt value : 4.6', 'Alt value : 4.8', 'Alt value : 4.4', 'Alt value : 4.7', 'Alt value : 4.7', 'Alt value : 3.8', 'Alt value : 4.5', 'Alt value : 3.3', 'Alt value : 4.5']

答案 2 :(得分:0)

您可以编写如下内容来检索具有“ alt”属性的所有div:

   protected void WebDataGrid1_SelectedIndexChanged(object sender,DataGridViewCellEventArgs e)
   {
       txtCode.Text = WebDataGrid1.Rows[e.RowIndex].call[1].Value.ToString();
   }

并获取值:

xml = bs.find_all("div", {"alt": True})

如果您只想要第一个“ alt”,则直接如下所示:

for x in xml:
    print(x["alt"])