Question

team = hxs.select（'//table[@class="tablehead"/tbody/tr[contains[.@class,"player“]'）

我要选择其表格的网站I的结构如下：

<html>
 <body>
  <table>
   <tbody>
    <tr>
     <td>...</td>
     <td>...</td>
       ...
    </tr>
   </tbody>
  </table>
 </body>
</html>

由于网站中有多个表，我只想选择其类定义为“tablehead”的表。此外，对于该表，我只想选择其类属性包含字符串“player”的标签。我上面的尝试开始时看起来有点不稳定。我尝试运行爬虫，它说我上面生成的行是无效的xpath行。任何建议都会很好。

Answer 1

之前我遇到过这些问题，尝试在xpath表达式中省略tbody。

Answer 2

// table [@ class =“tablehead”/ tbody / tr [contains [。@ class，“player”]

更正此结果：

//table[@class='tablehead']/tbody/tr[contains(@class, 'player')]

这会选择tr属性包含字符串class的每个"player"字符串值，并且tr）是tbody的子项这是XML文档中任何table的子项，其class属性的字符串值为"tablehead"。

基于XSLT的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
   <xsl:copy-of select=
    "//table[@class='tablehead']
        /tbody/tr[contains(@class, 'player')]
    "/>
 </xsl:template>
</xsl:stylesheet>

在提供的XML文档上应用此转换时（稍微更加真实一点）：

<html>
    <body>
        <table class="tablehead">
            <tbody>
                <tr class="major-player">
                    <td>player1</td>
                    <td>player2</td>
                </tr>
            </tbody>
        </table>
    </body>
</html>

评估Xpath表达式，并将选定的节点（本例中只有一个）复制到输出：

<tr class="major-player">
   <td>player1</td>
   <td>player2</td>
</tr>

Scrapy中的xpath语法

2 个答案: