Question

我正在尝试使用pandas read_excel（）读取具有多索引行的Excel文件，该索引的第二级包含缺失值。这种类型的多索引在统计数据中并不罕见。如何避免read_excel（）填充索引中的缺失值？

为说明这一点，请考虑以下示例：

In [1]: import pandas as pd

In [2]: m_indx = pd.MultiIndex.from_tuples(
   ...:     [ ('foo','',),
   ...:       ('foo','of which bar',),
   ...:       ('baz','',),
   ...:       ('baz','of which qux',),
   ...:     ]
   ...: )

In [3]: df = pd.DataFrame([[10,],[5,],[15,],[3,]], columns=['Volume'], index=m_indx)

In [4]: df
Out[4]: 
                  Volume
foo                   10
    of which bar       5
baz                   15
    of which qux       3

In [5]: df.to_excel("test.xlsx")

In [6]: pd.read_excel('test.xlsx', index_col=[0,1])
Out[6]: 
                  Volume
foo NaN               10
    of which bar       5
baz of which bar      15
    of which qux       3

我想抑制'of which bar'的重复，因为它不在从磁盘读取的excel文件中。（我正在使用Python 3.7.7和Pandas 1.0.3）

熊猫处理多索引中的缺失值read_excel

0 个答案: