如何使用Xpath或CSS提取特定的<li>元素?

时间:2018-06-21 15:13:24

标签: python css xpath scrapy web-crawler

我如何提取在li内部找到的范围内的信息?

<div class="col-md-offer-content">
 <ul class="params-list">
  <li>
   <ul class="main-list">
    <li>Preço 
       <span><strong>350 €</strong></span> 6 €/m²</li>
    <li>Área útil (m²) 
       <span><strong>60 m²</strong></span></li>
    <li>Tipologia 
       <span><strong>T1</strong></span></li>
   </ul>

我编写了此代码:r

esponse.xpath ('// ul [@ class = "mainlist"] / li [span = "T1"] / text ()'). Extract () 

,输出为:['Typology']

但就我而言,我希望它返回T1,所以我做到了:

response.xpath ('// ul [@ class = "main-list"] / span [li = "Tipology"] / text ()') .extract ()

但它不返回任何内容...我在做什么错?有什么建议吗?

2 个答案:

答案 0 :(得分:1)

您也许可以做到这一点:

esponse.xpath ('// ul [@ class = "mainlist"] / [span="T1"] / text ()'). Extract () 

OR

//ul[@id='mainlist']//li[3]//span

OR

如果您将一个类添加到跨度

<div class="col-md-offer-content">
 <ul class="params-list">
  <li>
   <ul class="main-list">
    <li>Preço 
       <span><strong>350 €</strong></span> 6 €/m²</li>
    <li>Área útil (m²) 
       <span><strong>60 m²</strong></span></li>
    <li>Tipologia 
       <span class="thisSpan"><strong>T1</strong></span></li>
   </ul>

然后使用如下所示的xpath:

esponse.xpath ('// ul [@ class = "mainlist"] / span[@class="thisSpan"] / text ()'). Extract () 

在CSS中:driver.findElement(By.cssSelector("ul#mainlist span.thisSpan"));

答案 1 :(得分:0)

xpath = '//ul[@class="main-list"]//li[3]//text()'
val = response.xpath(xpath).extract_first()

您将只获得李三的价值。 如果您想要一个具有每个李的价值的清单。 尝试:     xpath = "//ul[@class="main-list"]//li//text()"     li_vals_list = response.xpath(xpath).extract()  使用这种方法,您将获得每个li的值,并且只需通过索引即可获得所需的li值:)

谢谢。