scrapy:当多个div完全相同时,只提取一个div

时间:2017-06-21 07:00:33

标签: python html scrapy extract

我是scrapy世界的新手......有人能帮助我吗?

您知道如何抓取此代码中的第一个元素列表(即只是价格“Prix”)吗?事实上,我只是想要一个价格和产品数量的价格清单,但它给了我一切(价格,品牌 - 我删除了这部分,颜色 - 我也删除了这部分,明星等)。< / p>

<div id="facetsList" class="mgFacetContent">
 <div class="jsFacetListing mgFacetListing mgFOpen">
  <div class="jsFacetTitle mgFTitle">

#just here --->

   <span>Prix</span>

#<-----

   <span class="mgFIcon"></span>
     </div>
    <div class="mgFAllList">
     <input type="hidden" name="FacetForm.SelectedFacets.Index" value="0" />
     <ul class="mgFList">
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]" value="f/7/[_1200]">
         <span title="&lt;10 € (276)"><10 € (276)</span>
        </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]" value="f/7/[800_2500]">
         <span title="10 &#224; 20 € (314)">10 à 20 € (314)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]"  value="f/7/[1900_5500]">
        <span title="20 &#224; 50 € (404)">20 à 50 € (404)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]"  value="f/7/[4800_10500]">
        <span title="50 &#224; 100 € (232)">50 à 100 € (232)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]"  value="f/7/[9500_21500]">
        <span title="100 &#224; 200 € (259)">100 à 200 € (259)</span>
       </label>
      </li>   
     </ul>
     <ul class="mgFListMore">
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]" value="f/7/[19000_51500]">
        <span title="200 &#224; 500 € (161)">200 à 500 € (161)</span>
       </label>
      </li>
      <li>
       <label><input type="checkbox"  name="FacetForm.SelectedFacets[0]" value="f/7/[48000_110000]">
        <span title="500 &#224; 1000 € (56)">500 à 1000 € (56)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[0]" value="f/7/[90000_]">
        <span title="1000 € et + (22)">1000 € et + (22)</span>
       </label>
      </li>
     </ul>
    </div>
    <div class="mvFLink mgFLinkSeeMore jsFLink">de choix</div>
   </div>
   <div class="jsFacetListing mgFacetListing mgFOpen">
    <div class="jsFacetTitle mgFTitle">
     <span>Avis clients</span>
     <span class="mgFIcon"></span>
    </div>
    <div class="mgFAllList">
     <input type="hidden" name="FacetForm.SelectedFacets.Index" value="3" />
     <ul class="mgFList">
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[3]" value="f/374/[300_500]">
        <span title="3 &#233;toiles et + (77)">3 étoiles et + (77)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[3]" value="f/374/[400_500]">
        <span title="4 &#233;toiles et + (63)">4 étoiles et + (63)</span>
       </label>
      </li>
      <li>
       <label>
        <input type="checkbox"  name="FacetForm.SelectedFacets[3]" value="f/374/[500_500]">
        <span title="5 &#233;toiles (30)">5 étoiles (30)</span>
       </label>
      </li>
     </ul>
     <ul class="mgFListMore"></ul>
    </div>
   </div>

我用xpath尝试了很多东西:

        if response.xpath('//div[@class="jsFacetListing mgFacetListing mgFOpen"]/div[@class="mgFAllList"]/ul/li/label/input[@name="FacetForm.SelectedFacets[0]"]'):
          nbproducts = response.xpath('/span/text()').re(r'\u20ac \s*(.*)')
          avgcost = response.xpath('../span/text()').re(r'\s*(.*)')

但我不认为它的运作方式......

非常感谢

1 个答案:

答案 0 :(得分:1)

您可以在xpath表达式中使用索引:

response.xpath('(//div[@class="jsFacetTitle mgFTitle"])[1]/span[1]/text()').extract()
['Prix']