使用xpath和scrapy从HTML中提取特定值

时间:2017-11-10 17:47:09

标签: python html xpath scrapy

我有以下HTML代码:

<tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1">
<td class="table-matches__tt"><span class="table-matches__time" data-live-cell="time">19:00</span><a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink"><span>Oberneuland</span> - <span>Habenhauser</span></a></td>
<td class="livebet" data-live-cell="livebet">&nbsp;</td>
<td class="table-matches__streams" data-live-cell="score">
</td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6ev9v&amp;otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">1.10</a></td>
<td class="table-matches__odds" data-oid="2p2k5xv498x0x0"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv498x0x0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">7.44</a></td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6eva0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">12.40</a></td>
</tr>

我尝试从以下代码中删除3个浮点值:1,10 7.44 12.40 我试图用来设置值的表达式如下:

response.xpath('//a/@target').extract()

我得到的输出是'mySelections'

我希望得到旁边的价值。什么是正确的表达方式?

提前谢谢

1 个答案:

答案 0 :(得分:1)

出了什么问题

response.xpath(&#39; // A /的 @target &#39)。提取物()

为什么?

  • 如果格式化HTML,则错误很明显。

      

    您想从text代码中提取a,而不是target属性。

            <tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1">
               <td class="table-matches__tt">
                  <span class="table-matches__time" data-live-cell="time">19:00</span>
                  <a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink">
                  <span>Oberneuland</span> - <span>Habenhauser</span>
                  </a>
               </td>
               <td class="livebet" data-live-cell="livebet">&nbsp;</td>
               <td class="table-matches__streams" data-live-cell="score"></td>
               <td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v">
    
               <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6ev9v&amp;otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" 
                  onclick="return my_selections_click('1x2', 'soccer');" 
                  title="Add to My Selections" 
                  target="mySelections">1.10</a>
    
               </td>
               <td class="table-matches__odds" data-oid="2p2k5xv498x0x0">
    
               <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv498x0x0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" 
                  onclick="return my_selections_click('1x2', 'soccer');" 
                  title="Add to My Selections" 
                  target="mySelections">7.44</a>
    
               </td>
               <td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0">
    
               <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6eva0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" 
                  onclick="return my_selections_click('1x2', 'soccer');" 
                  title="Add to My Selections" 
                  target="mySelections">12.40</a>
    
               </td>
            </tr>
    

    如何解决

  • 使用以下其中一项

    • response.xpath('//a/text()').extract()
    • 根据其他开发人员的说法,response.xpath有时会导致错误,您应该使用scrapy's selector

      from scrapy.selector import Selector
      result_array = Selector(text=response.body).xpath('//a/text()').extract()