Question

我有下表：

<table class="information">
  <tr> .... lots of rows with <th> and <td></tr>
  <tr>
   <th>Nationality</th>
   <td><a href="..">Stackoverflowian</a></td>
  </tr>
</table>

我想在其中的“国籍”下找到td标签内的文字。我应该如何在那里航行？我正在使用Beautifulsoup和Python。

补充说，在此之上有很多th和td标签，以强调它只是找到第一个

Answer 1

找到th标记，然后获取其next sibling：

soup = BeautifulSoup(html)
ths = soup.find_all('th')
for th in ths:
    if th.text == "Nationality":
        print th.next_sibling.next_sibling.text

# Stackoverflowian

我们需要执行next_sibling两次，因为第一个会给出换行符。

Answer 2

我已经修改了这个答案，因为您提供了一个您尝试解析的特定HTML页面。

r = requests.get("http://https://en.wikipedia.org/wiki/Usain_Bolt")
# test that we loaded the page successfully!
soup = BeautifulSoup(r.text, "html.parser")

thTag = soup.find('th', text='Nationality'):
tdTag = thTag.next_sibling.next_sibling

print(tdTag.text)
>>>'Jamaican'

Answer 3

如果您正在查看for the table本身，请考虑find_parent()

使用th文本导航表

3 个答案: