Question

我试图从维基页面中删除国家/地区列表并未成功：

以下是来自this page的相关HTML：

<table class="wikitable sortable jquery-tablesorter">
<thead>
<tbody>
<tr>
<td>

这是我的代码

url = "https://en.wikipedia.org/wiki/List_of_countries_by_average_elevation"
soup = BeautifulSoup(read_url(url), 'html.parser')

table = soup.find("table", {"class":"wikitable"})
tbody = table.find("tbody")   
rows = tbody.find("tr") <---this gives the error, saying tbody is None

countries = []
altitudes = []

for row in rows:
    cols = row.findAll('td')
    for td in cols:
        if td.a:
            countries.append(td.a.text)
        elif "m (" in td.text:
            altitudes.append(float(td.text.split("m")[0].replace(",", "")))

这是错误：

Traceback (most recent call last):
  File "wiki.py", line 18, in <module>
    rows = tbody.find("tr")
AttributeError: 'NoneType' object has no attribute 'find'

然后我尝试用soup.find('tr')直接选择行。

这会导致NavigableString错误。我还可以尝试以配对的方式检索信息吗？

Answer 1

如果您转到页面源并搜索tbody，您将获得0结果，因此可能是第一个问题的原因。维基百科似乎使用自定义<table class="wikitable sortable">，但未指定tbody。

对于您的第二个问题，您需要使用find_all而不是find，因为find只返回第一个tr。所以你想要

rows = soup.find_all("tr")

希望这会有所帮助：）

Answer 2

下面的代码对我有用 -

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_countries_by_average_elevation"

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table')


countries = []
altitudes = []

for row in table.find_all('tr')[1:]:
    col = row.find_all('td')
    country= col[0].text.strip()
    elevation = float(''.join(map(unicode.strip,col[1].text.split("m")[0])).replace(',',''))
    countries.append(country)
    altitudes.append(elevation)

print countries,'\n',altitudes

试图选择表中的行，总是得到NavigableString错误

2 个答案: