Question

我正在使用Beautiful Soup从维基百科中删除网页。网页有几个表，我试图访问一个特定的表。表的类名是“wikitable”，但很少有其他表具有相同的类名。当我使用下面的代码时，我会在网页上找到第一个表格。但我需要第二张表。

my_table = str(soup.find("table","wikitable"))

我也尝试使用标题，但它没有用。

soup.find("caption", text="Demographics of student body").find_parent("table")

我收到错误“AttributeError：'NoneType'对象没有属性'find_parent'”

以下是我试图访问的表格的HTML代码。

<table style="text-align:center; float:left; font-size:85%; margin-right:2em;" class="wikitable">
<caption><i>Demographics of student body</i><sup id="cite_ref-Head_count_124-0" class="reference"><a href="#cite_note-Head_count-124">[124]</a></sup><sup id="cite_ref-125" class="reference"><a href="#cite_note-125">[125]</a></sup><sup id="cite_ref-126" class="reference"><a href="#cite_note-126">[126]</a></sup></caption>

我将不胜感激任何指导。我使用的是Python 3。

由于

Answer 1

find方法仅返回第一个匹配项，您应使用find_all并选择第二项。

my_table = soup.find_all("table", class_="wikitable")[1]

如果您更喜欢css选择器：

my_table = soup.select('table.wikitable')[1]

引发AttributeError例外，因为您要查找的字符串属于＆＃39; i＆＃39;标记内部＆＃39;标题＆＃39;，以便find返回None。如果您选择“我”，您可以使其正常工作。

my_table = soup.find("i", string="Demographics of student body").find_parent("table")

使用Beautiful Soup在Python中进行Web抓取以查找特定表

1 个答案: