用python刮多条相似的线

时间:2018-12-04 20:43:51

标签: python-3.x web-scraping beautifulsoup

使用一个简单的请求,我试图从此html页面获取一些存储在“ alt”中的信息。问题在于,在每个实例中,信息都以“ img”开头的多行分开,当我尝试访问它时,我只能读取“ img”的第一个实例,而不能读取其余的实例,但是我我不确定该怎么做。这是HTML文本:

<div class="archetype-tile-description-wrapper">
    <div class="archetype-tile-description">
        <h2>
            <span class="deck-price-online">
                <a href="/archetype/standard-golgari-midrange-60634#online">Golgari Midrange</a>
            </span>
            <span class="deck-price-paper">
                <a href="/archetype/standard-golgari-midrange-60634#paper">Golgari Midrange</a>
            </span>
        </h2>
        <div class="manacost-container">
            <span class="manacost">
                <img alt="b" class="common-manaCost-manaSymbol sprite-mana_symbols_b" src="//assets1.mtggoldfish.com/assets/s-d69cbc552cfe8de4931deb191dd349a881ff4448ed3251571e0bacd0257519b1.gif" />
                <img alt="g" class="common-manaCost-manaSymbol sprite-mana_symbols_g" src="//assets1.mtggoldfish.com/assets/s-d69cbc552cfe8de4931deb191dd349a881ff4448ed3251571e0bacd0257519b1.gif" />
            </span>
        </div>
        <ul>
            <li>Jadelight Ranger</li>
            <li>Merfolk Branchwalker</li>
            <li>Vraska's Contempt</li>
        </ul>
    </div>
</div>

话虽如此,我想从中得到的是“ b”和“ g”,并将它们存储在一个变量中。

1 个答案:

答案 0 :(得分:0)

您可能可以像这样用类<img>来抓取"common-manaCost-manaSymbol"元素:

imgs = soup.find_all("img",{"class":"common-manaCost-manaSymbol"})

然后您可以遍历每个<img>并获取其alt属性。

alts = []
for i in imgs:
    alts.append(i['alt'])

或具有列表理解

alts = [i['alt'] for i in imgs]