Question

我有一个包含1000个表格的html文件。它们之间是数据。例如：

This is table containing data of Bill.
table1.
there is a general text that I can ignore.
This is table containing data of Allen
Table2
there is a general text that I can ignore.

我正在尝试以以下方式获取数据

：

Bill.xlsx with table1.
Allen.xlsx with table2.

我正在使用熊猫来获取桌子。

df = pd.read_html(url)
for i in range(0,len(df)):
    print(df[i])
    filename="test"+str(i)+".xlsx"
    df[i].to_excel(filename)

=

lines=file.raedlines()
while reading each line I am doing

p=re.compile("This is table containing data of (.*)")
if p:
   print(p.findall(k)

= 我能够分离并得到那个，但我可以一次完成吗因为某些格式化问题有时表明数据是

This is a table containing data of Bill.
table1.
table1a
there is a general text that I can ignore.
This is table containing data of Allen
Table2
there is a general text that I can ignore.

所以我不能保留数组并盲目地将文件名分配为表。

使用表和字符串比较

0 个答案: