Question

我正在逐行处理表，需要嗅探行中的id：

<table id="tbl">
  <tr id="row_1">
    <td id="cell_1">...</td>
  </tr>
  <tr id="row_2">
    <td id="cell_2">...</td>
  </tr>
</table>

所以我的代码如下：

def parse_table(self, response):
    rows = response.css('#tbl > tr')
    for row in rows:
        rowid = row.css('::attr(id)')
        if rowid.extract_first().startswith('row'):
            ...

但是，通过这种方式，对.css()的第二次调用给了我row的所有后代的ID，而不仅仅是其直接子代。即对于以上示例HTML，它返回"cell_1"和"row_1"。如何确定链接的css()调用的作用域，使其仅作用于给定行的直接子级？

我尝试使用:scope伪类，但是Scrapy似乎不支持，并且:root没有给我任何结果。

或者，我可以不通过CSS来获取id属性的值吗？

Answer 1

我可以向您展示如何将XPath用于同一任务：

def parse_table(self, response):
    for row in response.xpath('//*[@id="tbl"]/tr'):
        rowid = row.xpath('./@id').extract_first()
        if rowid.startswith('row'):
            ...

如何使用链接的`css（）`调用，以便第二个调用中的选择器将第一个调用用作上下文？

1 个答案: