Question

所以，我想从“h1”标签中获取文本。我正在使用BeutifulSoup，它工作正常，直到“article”标签中没有“h1”标签，然后我得到“'NoneType'对象没有属性'内容'错误。这是代码：

from bs4 import BeautifulSoup

page = 

    "<article>
    <a href="http://something">
    </a>   (missing "h1")
    <a href="http://something">
    </a>
    </article>
    <article>
    <a href="http://something">
    </a>
    <a href="http://something">
       <h1>something</h1>
    </a>
    </article>
    <article>
    <a href="http://something">
    </a>
    <a href="http://something">
       <h1>something</h1>
   </a>
   </article>"

soup = BeautifulSoup(page, "lxml")

h1s = []

articles = soup.find_all("article")


for i in range(1,len(articles)):
    h1s.append(articles[i].h1.contents)

当我用h1标签检查行而没有。

时，这些是消息

type(articles[0].h1) 
<type 'NoneType'>
type(articles[1].h1)
<class 'bs4.element.Tag'>

Answer 1

您应该循环遍历articles这是一个列表，然后使用find_all()方法获取h1标记内的所有a，然后添加其{ {1}}到h1s。似乎这就是你想要的 -

text

BeautifulSoup - 标记下缺少标记

1 个答案: