使用lxml从html解析多个值

时间:2016-02-02 02:19:15

标签: python html web-scraping lxml

对如何使用lxml感到很困惑...我通常使用正则表达式,因为我可以一次提取所有数据,但我不知道如何用lxml解析这些值:

data = tree.xpath('//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]')
# extract data from div class: featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2

"M4A4 | Poseidon " + "Factory New"
"9462141"
"195.00"
"https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f"

"Chroma 2 Case Key"
"9462120"
"2.11"
"https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f"

我需要解析的html代码:

    <div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2">
    <div>
        <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=M4A4+%7C+Poseidon+%28Factory+New%29" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462141">
                M4A4 | Poseidon
            </a>
        <div class="item-desc">
            <small class="text-muted">Factory New</small>
            <small style="color:#777777">Classified Rifle</small>
            <small class="item-warning"></small>
        </div>
        <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f">
        <div class="item-add">
            <div class="item-amount">$195.00</div>
            <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/115787731/" target="_BLANK">Suggested Price: $258.52</a>
            </div>
            <div class="item-buttons text-center"><a href="steam://rungame/730/76561202255233023/+csgo_econ_action_preview%20S76561198236464786A5000169384D16322433520890898502" class="btn btn-primary" style="margin-right:4px">Inspect</a>
                <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462141)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span>
            </div>
        </div>
    </div>
</div>

<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2">
    <div>
        <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=Chroma+2+Case+Key" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462120">
                Chroma 2 Case Key
            </a>
        <div class="item-desc">
            <small class="text-muted"></small>
            <small style="color:#777777">Base Grade Key</small>
            <small class="item-warning"></small>
        </div>
        <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f">
        <div class="item-add">
            <div class="item-amount">$2.11</div>
            <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/100994798/" target="_BLANK">Suggested Price: $2.70</a>
            </div>
            <div class="item-buttons text-center">
                <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462120)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span>
            </div>
        </div>
    </div>
</div>

PS:我是否需要为'//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]'的每个实例执行for循环 或lxml将每个数据提取为列表?

1 个答案:

答案 0 :(得分:1)

xpath返回实例列表,您必须使用for循环从实例中获取子元素。

示例代码低于data

data ='''<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2">
    <div>
        <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=M4A4+%7C+Poseidon+%28Factory+New%29" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462141">
                M4A4 | Poseidon
            </a>
        <div class="item-desc">
            <small class="text-muted">Factory New</small>
            <small style="color:#777777">Classified Rifle</small>
            <small class="item-warning"></small>
        </div>
        <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f">
        <div class="item-add">
            <div class="item-amount">$195.00</div>
            <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/115787731/" target="_BLANK">Suggested Price: $258.52</a>
            </div>
            <div class="item-buttons text-center"><a href="steam://rungame/730/76561202255233023/+csgo_econ_action_preview%20S76561198236464786A5000169384D16322433520890898502" class="btn btn-primary" style="margin-right:4px">Inspect</a>
                <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462141)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span>
            </div>
        </div>
    </div>
</div>

<div class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2">
    <div>
        <a class="glyphicon glyphicon-search market-name market-search-icon opskins-search-button" href="/?loc=shop_search&amp;sort=lh&amp;search_item=Chroma+2+Case+Key" title="Search"></a> <a class="market-name market-link" href="?loc=shop_view_item&amp;item=9462120">
                Chroma 2 Case Key
            </a>
        <div class="item-desc">
            <small class="text-muted"></small>
            <small style="color:#777777">Base Grade Key</small>
            <small class="item-warning"></small>
        </div>
        <img class="item-img" src="https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f">
        <div class="item-add">
            <div class="item-amount">$2.11</div>
            <div class="market-name" style="padding-bottom:0.3em;"><i class="stm stm-steam" title="Steam Analyst"></i> <a style="color:white;" href="http://csgo.steamanalyst.com/id/100994798/" target="_BLANK">Suggested Price: $2.70</a>
            </div>
            <div class="item-buttons text-center">
                <button class="btn btn-orange" type="button" id="shopItem" onclick="addToCart(9462120)">Add to Cart</button><span style="margin-left:3px;"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/apps/730/69f7ebe2735c366c65c0b33dae00e12dc40edbe4.jpg" data-appid="730" style="opacity: 0.7; display:inline"></span>
            </div>
        </div>
    </div>
</div>'''

import lxml, lxml.html

html = lxml.html.fromstring(data)

divs = html.xpath('//div[@class="featured-item col-xs-12 col-sm-6 col-md-4 col-lg-3 center-block app_730_2"]')

for x in divs:
    a = x.xpath('.//a/text()')[0]
    print a.strip()

    small = x.xpath('.//small[@class="text-muted"]/text()')
    if small:
        print small[0]

    div = x.xpath('.//div[@class="item-amount"]/text()')[0]
    print div

    a_href = x.xpath('.//a/@href')
    item = a_href[1].split('=')[-1] 
    print item

    img = x.xpath('.//img[@class="item-img"]/@src')[0]
    print img

-

M4A4 | Poseidon
Factory New
$195.00
9462141
https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXH5ApeO4YmlhxYQknCRvCo04DEVlxkKgpou-6kejhjxszYfi5H5di5mr-HnvD8J_WCkmkEvp0pi7zDodv3jAHj-UM5ZGr7INfHJAc9MlzV-FK_kO281pa_ot2XnrA-A3kA/256fx256f
Chroma 2 Case Key
$2.11
9462120
https://steamcommunity-a.akamaihd.net/economy/image/-9a81dlWLwJ2UUGcVs_nsVtzdOEdtWwKGZZLQHTxDZ7I56KU0Zwwo4NUX4oFJZEHLbXX7gNTPcUxuxpJSXPbQv2S1MDeXkh6LBBOie3rKFRh16PKd2pDvozixtSOwaP2ar7SlzIA6sEo2rHCpdyhjAGxr0A6MHezetG0RZXdTA/256fx256f