Question

如何获取嵌套在其他表和表单标记中的表行。我尝试了很少的代码，但似乎没有用。

我使用了下面的python代码，但无法获得任何内容

def parse(self, response):
    t = response.xpath('//table[@class="DataGrid"]/tbody/tr')
    for tr_obj in enumerate(t):
        print(tr_obj.xpath('td[1]/text()').extract_first())

下面是html代码，在此我需要获取类名为gridTable的表

<html>
<body>
    <table></table>
    <table>
        <tbody>
            <tr>
                <td>
                    <span></span>
                    <script></script>
                    <form>
                        <table class="dPage1">
                            <tbody>
                                <tr></tr>
                                <tr>
                                    <td>
                                        <table>
                                            <tbody>
                                                <tr>
                                                    <td>
                                                        <table class="gridTable">

                                                        </table>
                                                    </td>
                                                </tr>
                                            </tbody>
                                        </table>
                                    </td>
                                </tr>
                            </tbody>
                        </table>
                    </form>
                </td>
            </tr>
        </tbody>
    </table>
</body>
</html>

解决方案

 for tr_obj in enumerate(response.xpath('//table[@class="DataGrid"]/tr')):
        print(tr_obj.xpath('td[1]/text()').extract_first())

Answer 1

您可以通过在括号中指定标记来选择xpath中要遵循的标记。

对于你的例子，它将是：

 //table[@class="gridTable"]/...

Answer 2

建议您不要在scrapy documentation的tbody语句中使用XPath。

请尝试不使用它们和/或尝试使用/*/或//来规避它们。

尝试类似：

def parse(self, response):
    # Get a Selector list for all rows
    sel_rows = response.xpath('//table[@class="DataGrid"]/tr')

    # loop over row selectors ...
    for sel_row in sel_rows:
        print(sel_row.xpath('td[1]/text()').extract_first())

如何在scrapy spiders

2 个答案: