如何使用Xpath刮取介于<div>标签之间的产品详细信息

时间:2018-02-23 10:06:39

标签: python-3.x xpath scrapy

我想找到BIBA这个词。我的输出类似于\n,\n,\nBIBA,\n但我只想要&#34; BIBA&#34; 请帮我用xpath获取该名称。

谢谢。

<div class="pdp-bname">
    <input type="hidden" value="/wishlist/getWishListData" `enter code here`id="miniWishlistFormActionUrl">
    <div class="prd-fav addToWishlist2">
        <form id="addToWishlistForm202180385_9607" class="wishlistPdpAddOrRemove" action="/wishlist/addOrRemoveWishlist/202180385_9607" method="POST"> <input type="hidden" value="5f49e2f4-9c05-4a5a-83b9-6edbc780cbe5" id="ajaxCSRF">
            <button type="submit" id="addwishlistId" class="go_link wishlistSubmitBtn wishlist ">
                 <!-- <label class="labletext">Add to wishlist</label> -->
            </button>
            <div>
                <input type="hidden" name="CSRFToken" value="5f49e2f4-9c05-4a5a-83b9-6edbc780cbe5">
            </div>
        </form>
    </div>
    "
    BIBA

    "
</div>

2 个答案:

答案 0 :(得分:1)

我强烈建议您使用Scrapy Item Loaders Input and Output processors

def strip_word(value):
    value = value.strip()
    return value

class MyItem(scrapy.Item):
    my_word_field = scrapy.Field(
        input_processor=TakeFirst(),
        output_processor=MapCompose(strip_word)
    )

答案 1 :(得分:0)

这个怎么样:

response.xpath('normalize-space(//div[@class="pdp-bname"])')