Question

这是我的源代码：

<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td align="center">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td style="border-left: 1px solid rgb(153, 153, 153); border-right: 1px solid rgb(153, 153, 153);">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<tr>
<td height="511">
<table width="100%" cellspacing="0" cellpadding="5" border="0" height="500">
<tbody>
<tr>
<td width="1%" valign="top" height="500">
<table width="100%" cellspacing="1" cellpadding="1" bordercolor="#CCCCCC" border="0" bgcolor="#FFFFFF" align="center">
<tbody>
<tr bgcolor="#BB375F" bordercolor="#CCCCCC">

如何编写XPath以访问最里面的<tr>标记？

以下是我尝试的内容：

top_table = response.xpath("//table[4]/tbody/tr/td")
content_table = top_table.xpath("table")
print content_table

这是我得到的输出：

[ < Selector xpath='table' data=u' < table width="100%" border="0" cellspaci' > ]

基本上我能够到达第一行的倒数第二个表，最里面的表是我想要达到的。不确定如何继续或我出错的地方？欢迎任何帮助或建议。谢谢！

Answer 1

基本上我能够到达第一行的倒数第二个表格，而最里面的表格是我想要达到的目的。

一般来说，最常见table的一种可能方法是确保候选人table没有后代table元素：

//table[not(.//table)]

所以我建议您尝试使用这样的内容从最内层tr/td获取table：

top_table = response.xpath("//table[not(.//table)]/tbody/tr/td")

Python Scrapy - 嵌套表标签的XPath

1 个答案: