Question

当我在浏览器中查看html代码时，在下面看到了以下内容（减去第3和4行中的星号）。但是，当我抓取数据并在scrapy shell中打印html时，***行不存在。为什么是这样？另外，如何获取colspan =“ 2”的文本？谢谢。我一直在尝试以下方法：

response.xpath('//table[@id="playertable_0"]/tbody/tr/th[@colspan="2"]//text()')

我使用的实际网址是：http://games.espn.com/ffl/leaders?&scoringPeriodId=1&seasonId=2018。要获取以下html，我正在运行以下代码：

table = response.xpath('//table[@id="playertable_0"]')
table.css('tr.playerTableBgRowHead.tableHead.playertableSectionHeader').extract()


    <tr class="playerTableBgRowHead tableHead playertableSectionHeader">
        <th colspan="1" class="playertableSectionHeaderFirst">OFFENSIVE PLAYERS</th>
        ***<td class="sectionLeadingSpacer"></td>***
        ***<th colspan = "2" > WK 1 </th> == $0***
        <td class="sectionLeadingSpacer"></td>
        <th colspan="4">PASSING</th>
        <td class="sectionLeadingSpacer"></td>
        <th colspan="3">RUSHING</th>
        <td class="sectionLeadingSpacer"></td>
        <th colspan="4">RECEIVING</th>
        <td class="sectionLeadingSpacer"></td>
        <th colspan="3">MISC</th><td class="sectionLeadingSpacer">
        </td><th colspan="1">TOTAL</th>
    </tr>'

Answer 1

通过JS注入的不是In [2723]: Appdict Out[2723]: {'App': [['serv1', '10.10.10.10', '00', 'onehost'], ['serv2', '10.10.10.20', '01', 'twohost']]}或<tr>标签。它是<th>。因此，以下xpath起作用。

<tbody>

从浏览器中查看源代码将告诉您从JS注入了什么，以及以HTML返回了什么。

请参见response.xpath('//table[@id="playertable_0"]//tr/th[@colspan="2"]//text()')（Chrome浏览器）。

查找缺少的HTML

1 个答案: