我有一个包含1000个表格的html文件。它们之间是数据。 例如:
This is table containing data of Bill.
table1.
there is a general text that I can ignore.
This is table containing data of Allen
Table2
there is a general text that I can ignore.
我正在尝试以以下方式获取数据
:Bill.xlsx with table1.
Allen.xlsx with table2.
我正在使用熊猫来获取桌子。
df = pd.read_html(url)
for i in range(0,len(df)):
print(df[i])
filename="test"+str(i)+".xlsx"
df[i].to_excel(filename)
=
lines=file.raedlines()
while reading each line I am doing
p=re.compile("This is table containing data of (.*)")
if p:
print(p.findall(k)
= 我能够分离并得到那个,但我可以一次完成吗 因为某些格式化问题有时表明数据是
This is a table containing data of Bill.
table1.
table1a
there is a general text that I can ignore.
This is table containing data of Allen
Table2
there is a general text that I can ignore.
所以我不能保留数组并盲目地将文件名分配为表。