我正在尝试使用pandas read_excel()读取具有多索引行的Excel文件,该索引的第二级包含缺失值。这种类型的多索引在统计数据中并不罕见。如何避免read_excel()填充索引中的缺失值?
为说明这一点,请考虑以下示例:
In [1]: import pandas as pd
In [2]: m_indx = pd.MultiIndex.from_tuples(
...: [ ('foo','',),
...: ('foo','of which bar',),
...: ('baz','',),
...: ('baz','of which qux',),
...: ]
...: )
In [3]: df = pd.DataFrame([[10,],[5,],[15,],[3,]], columns=['Volume'], index=m_indx)
In [4]: df
Out[4]:
Volume
foo 10
of which bar 5
baz 15
of which qux 3
In [5]: df.to_excel("test.xlsx")
In [6]: pd.read_excel('test.xlsx', index_col=[0,1])
Out[6]:
Volume
foo NaN 10
of which bar 5
baz of which bar 15
of which qux 3
我想抑制'of which bar'
的重复,因为它不在从磁盘读取的excel文件中。 (我正在使用Python 3.7.7和Pandas 1.0.3)