使用python beautifulsoup在等号后获取值的web抓取

时间:2016-04-23 22:02:48

标签: python web-scraping beautifulsoup

enter image description here

使用Python,我需要获取颜色的名称。

例如,data-label =“BLACK” - 这里我需要将BLACK作为输出

到目前为止,我有:

Color=section.find("div",{"class":"sfa-pa-product-swatches-thumbnails-container"})

1 个答案:

答案 0 :(得分:2)

迭代“thumbnail”元素并获取data-label属性的值:

colors = [elm.get("data-label", "No color specified") 
          for elm in soup.find_all("div", class_="sfa-pa-product-swatches-color")]

完整的代码:

import requests
from bs4 import BeautifulSoup

url = "http://www.saksfifthavenue.com/Handbags/shop/_/N-52jzot/Ne-6lvnb5?FOLDER%3C%3Efolder_id=2534374306622829"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

for product in soup.select("#product-container [id^=product-]"):
    product_name = product.find(class_="product-designer-name").get_text()
    colors = [elm["data-label"]
              for elm in product.find_all("div", class_="sfa-pa-product-swatches-color")]
    print(product_name, colors)

打印:

(u'Bao Bao Issey Miyake', [u'SILVER'])
(u'Nancy Gonzalez', [u'BLACK', u'BLUSH'])
(u'Bao Bao Issey Miyake', [u'GUNMETAL'])
...
(u'Saint Laurent', [u'WINE'])
(u'Fendi', [u'BLUE', u'FUCHSIA', u'WATER GREEN'])
(u'Prada', [u'LAGO'])

并确保您没有违反网站的Terms of Use - 保持合法的一面。