我只想提取img
['src']标签。我怎样才能做到这一点?我也希望他们也列在列表中。
from bs4 import BeautifulSoup as bs
import pandas as pd
html = '''
<div class="exp-grid-item-colorways-wrapper">
</div>
<div class="color-options" style="overflow: hidden; position: relative; z-index: 2; left: 0px; width: 180px;">
<ul style="margin: 0px; padding: 0px; position: relative; list-style-type: none; z-index: 1; width: 244px;">
<li style="">
<a class="color-chip" data-lp="$130" data-op="" data-bp="$78" data-obp="$130" data-coming-soon="false" data-product="amazon.com" data-pre-order="false" data-in-stock="true" data-sprite-index="0" data-imgurl="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" href="https://www.amazon.com/t/metcon-5-training-shoe-lFwjMP/AQ1189-001" style="overflow: hidden; float: left;" data-iscached="true">
<img src="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" class="sprite-sheet sprite-index-0">
</a>
</li>
<li>
<a class="color-chip" data-lp="$130" data-op="" data-bp="$78" data-obp="$130" data-coming-soon="false" data-product="amazon.com" data-pre-order="false" data-in-stock="true" data-sprite-index="1" data-imgurl="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" href="https://www.amazon.com/t/metcon-5-training-shoe-lFwjMP" style="overflow: hidden; float: left;" data-iscached="true">
<img src="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" class="sprite-sheet sprite-index-1">
</a>
</li>
<li>
<a class="color-chip" data-lp="$130" data-op="" data-bp="$78" data-obp="$130" data-coming-soon="false" data-product="amazon.com" data-pre-order="false" data-in-stock="true" data-sprite-index="2" data-imgurl="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" href="https://www.amazon.com/t/metcon-5-training-shoe-lFwjMP/AQ1189-010" style="overflow: hidden; float: left;" data-iscached="true">
<img src="https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010" class="sprite-sheet sprite-index-2">
</a>
</li>
<li>
<a class="color-chip" data-lp="$130" data-op="" data-bp="$78" data-obp="$130" data-coming-soon="false" data-product="amazon.com" data-pre-order="false" data-in-stock="true" data-sprite-index="3" data-imgurl="https://images.amazon.com/is/image/DotCom/AQ1189_344?$amazon_PWP_GRAY$" href="https://www.amazon.com/t/metcon-5-training-shoe-lFwjMP/AQ1189-344" style="overflow: hidden; float: left;" data-iscached="true">
<img data-src="https://images.amazon.com/is/image/DotCom/AQ1189_344?$amazon_PWP_GRAY$" src="https://images.amazon.com/is/image/DotCom/AQ1189_344?$amazon_PWP_GRAY$">
</a>
</div>
'''
soup = bs(html, "html.parser")
items = soup.select('.exp-grid-item-colorways-wrapper')
lista = []
imurl = soup.findAll('img')
print(imurl)
答案 0 :(得分:2)
这是方法:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
src_list = [i.get("src") for i in soup.find_all('img')]
print(src_list)
您的输出将是::
['https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010', 'https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx
3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010', 'https://images.amazon.com/is/image/DotCom/pwp_sheet2?$amazon_PWPx3$&$img0=AQ1189_001&$img1=AQ1189_006&$img2=AQ1189_010', 'https://im
ages.amazon.com/is/image/DotCom/AQ1189_344?$amazon_PWP_GRAY$']
希望这就是您的期望。
快乐编码:)