根据MultiIndex Pandas填充NaN

时间:2016-08-15 18:56:37

标签: python pandas

我有一个pandas数据框,我想填写一些NaN值。

import pandas as pd

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index)

print(df)

                       0
Type Year               
a    1990            NaN
     1994            NaN
     1996       SomeName
b    1992            NaN
     1997  SomeOtherName
c    2001  SomeThirdName

我想要的输出是:

Type Year               
a    1990       SomeName
     1994       SomeName
     1996       SomeName
b    1992  SomeOtherName
     1997  SomeOtherName
c    2001  SomeThirdName

这需要在更大的DataFrame(数百万行)上完成,其中每个'Type'可以具有1-5个唯一的'Years',并且名称值仅出现在最近一年。我试图避免为了性能而迭代行。

1 个答案:

答案 0 :(得分:2)

您可以按索引按降序对数据框进行排序,然后按margin-top: 56px...对其进行排序:

ffill

注意:示例数据实际上并不包含import pandas as pd df.sort_index(level = [0,1], ascending = False).ffill() # 0 # Type Year # c 2001 SomeThirdName # b 1997 SomeOtherName # 1992 SomeOtherName # a 1996 SomeName # 1994 SomeName # 1990 SomeName 值,而是字符串np.nan,因此要使NaN生效,您需要替换ffill字符串为NaN

np.nan

或者@ayhan建议,在更换String" NaN" import numpy as np df[0] = np.where(df[0] == "NaN", np.nan, df[0]) 使用np.nan