Question

我希望函数收集img tags的所有标题和h3 tags的所有文本。循环的输出为：

“ TypeError：'NoneType'对象不可下标”。

有人可以告诉我我做错了什么吗？


url = "https://www.historico.portugal.gov.pt/pt/o-governo/arquivo-historico/governos-constitucionais/gc18/composicao.aspx"

uClient = urlopen(url)

soup = BeautifulSoup(uClient.read(), "html.parser")

containers = soup.findAll("li")

container = containers[7]

for container in containers:
    name = container.img["alt"]
    j = container.findAll("h3", {"class":"mainForecolor"})
    job = c[0].text

    print("nome: " + name)
    print("cargo: " + job)

Answer 1

您可以按以下方式检索名称。

您的容器的类型为<class 'bs4.element.Tag'>，因此您需要对其进行迭代，然后寻找另一个标签，img的类型也为<class 'bs4.element.Tag'>，因此，要检索任何属性/属性标签，您将需要对其进行迭代。

url = "https://www.historico.portugal.gov.pt/pt/o-governo/arquivo-historico/governos-constitucionais/gc18/composicao.aspx"

uClient = urlopen(url)

soup = BeautifulSoup(uClient.read(), "html.parser")

container = soup.find_all('li')

for c in container:
    for link in c.findAll('img'):
        print("name : " +link.get('alt'))

或者，如果您想完全跳过Li标签，则可以直接找到所有img标签并按以下方式工作。

container = soup.find_all('img')

for c in container:
    print("name : "+c.get('alt'))

Answer 2

container.img不存在，因此属于None类型，无法以container.img['alt']的方式下标。

为什么container.img不存在？好吧，container将是一个没有Tag属性的BeautifulSoup img对象。也许您打算通过container['img']访问标签的属性。不幸的是，对于您提供的示例，container中的containers标签没有任何属性。

请参阅： https://www.crummy.com/software/BeautifulSoup/bs4/doc/#tag

网络抓取-TypeError：“ NoneType”对象不可下标

2 个答案: