我正在尝试从多个html文件中获取表格。理想情况下,我在列表中有行和列,因此可以对其进行进一步处理。我是BeautifulSoup的新手,但无法正常工作。我认为主要问题是在函数返回None时发生的,因此无法进一步处理。我尝试了if语句,但这无济于事。我现在的代码:
from bs4 import BeautifulSoup
table_dict = {}
for filename, text in tqdm(lowercase_dict.items()):
soup = BeautifulSoup(text, "lxml")
table = soup.find('table')
table_body = table.find('tbody')
if table_body is not None:
tables = table_body
rows = tables.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
table_dict[filename] = cols
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-304-14ade2e7b2ac> in <module>()
7 tables = table_body
8
----> 9 rows = tables.find_all('tr')
10 for row in rows:
11 cols = row.find_all('td')
AttributeError: 'str' object has no attribute 'find_all'
```
答案 0 :(得分:0)
根据您的错误消息,问题在于变量 tables 是一个字符串。不使用“ tbody”即可尝试。
for filename, text in tqdm(lowercase_dict.items()):
soup = BeautifulSoup(text, "lxml")
table = soup.find('table')
rows = table.find_all('tr')