Question

我试图使用BeautifulSoup从维基百科表中获取信息。现在我被堆叠了，因为我无法遍历一个对象。

这是代码：

import requests
from bs4 import BeautifulSoup

url='http://de.wikipedia.org/wiki/Liste_der_in_der_Europ%C3%A4ischen_Union_zugelassenen_Lebensmittelzusatzstoffe'
raw_data=requests.get(url)
soup=BeautifulSoup(raw_data.content)
table= soup.find_all("table",{"class":"wikitable sortable"})

for i in table:
    print i.contents[i].find_all("td")

这就是错误：

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: list indices must be integers, not Tag

如果我使用try :,除了：之外什么也不会打印。

有人可以帮助我吗？

非常感谢！

Answer 1

你是对的，一般来说，你可以通过处理例外来跳过TypeError，AttributeError和IndexError。

但是这里的错误是：

TypeError: list indices must be integers, not Tag

并且此错误是由以下原因引起的：

i.contents[i]

这里i不是整数，而是一个beautifulsoup元素标签。因此，您无法索引列表。

>>> type(i)
<class 'bs4.element.Tag'>

所以我相信您正在尝试查找其中的所有td标记。现在，当您循环table时，您已经选择了该元素。因此，只需执行find_all即可获得所有td元素：

i.find_all("td")

所以，你的代码应该是：

import requests
from bs4 import BeautifulSoup

url = 'http://de.wikipedia.org/wiki/Liste_der_in_der_Europ%C3%A4ischen_Union_zugelassenen_Lebensmittelzusatzstoffe'
raw_data = requests.get(url)
soup = BeautifulSoup(raw_data.content)
table = soup.find_all("table", {"class": "wikitable sortable"})

for i in table:
    print i.find_all("td")

Answer 2

for i in table:
    print i.contents[i].find_all("td")

此处i是列表中的元素 - table;不是整数我们无法contents[i]

你可能想尝试这样的事情，

import requests
from bs4 import BeautifulSoup

url='http://de.wikipedia.org/wiki/Liste_der_in_der_Europ%C3%A4ischen_Union_zugelassenen_Lebensmittelzusatzstoffe'
raw_data=requests.get(url)
soup=BeautifulSoup(raw_data.content)
table= soup.find_all("table",{"class":"wikitable sortable"})

for i in table:
    print i.find_all("td")

以下是按照您的方式进行的个性化答案： - ），

for i in table:
    for c in i.contents:
        try:print c.find_all("td")
        except:pass

希望有所帮助： - ）

Answer 3

写作时

i.contents[i].find_all("td")

，您对i的期望是什么？你期待i.contents是什么？为什么您希望能够将i本身用作i.contents的索引？

您应该回过头来仔细阅读文档，并准确了解soup.find_all返回的内容，以便了解您重复使用的i值。< / p>

如何迭代跳过python中的TypeError，AttributeError和IndexError？

3 个答案: