Question

我想要一个本地存储的HTML文件，可能包含多个表或可能没有，并在文件中搜索包含“表面区域”或缩写“SA”字样的表格并将其转换为一个使用美丽汤的熊猫数据框。这是我想要使用的文件类型的示例：

https://pubs.acs.org/doi/full/10.1021/ol403383y

我收到以下代码的以下错误，

"'NoneType' object has no attribute 'find_parent'".

我使用find_parent的方式有什么问题？是'标题'，正确的搜索方式吗？

代码：

from bs4 import BeautifulSoup
import string
import pandas as pd
import requests
import lxml

filename = input('Please enter HTML filename:   ')

with open (filename, encoding = "UTF-8") as f_input:
    html = f_input.read()
soup = BeautifulSoup(html, 'lxml')
tables = soup.find("caption", text="SA" or "surface area".find_parent("table")
for table in tables:
    if table.find_all('table') != []:
        continue
    df = pd.read_html(str(tables))

Beautifulsoup通过按列标题搜索，在html文件中找到特定的表

0 个答案: