Question

我正在尝试从jupyter笔记本上的htm文件中提取文本。我首先使用以下命令读取文件： with open('Materials.htm') as file b: file3=b.readlines() file3=''.join(file3)

然后，我解析文件并使用get_text（）。

Stock_page=BeautifulSoup(file3, 'lxml')
   for movers_name in Stock_page('td',style="text-align:left;"):
       movers=list()
       movers.append(movers_name.get_text())
       print(movers)

此代码会打印列表，但还会提供

AttributeError：“ NoneType”对象没有属性“ get_text”

我想将其放在for循环中以读取不同的文件，但是由于错误而无法正常工作。有人知道我在做什么错吗？比你！

Answer 1

您应该将文件对象原样传递给BeautifulSoup，然后将其解析为HTML。

with open('Materials.htm','r') as f:
    Stock_page = BeautifulSoup(f, "html.parser")

从html文件中提取文本会导致属性错误

1 个答案: