Question

我有一个pandas数据框，我想填写一些NaN值。

import pandas as pd

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index)

print(df)

                       0
Type Year               
a    1990            NaN
     1994            NaN
     1996       SomeName
b    1992            NaN
     1997  SomeOtherName
c    2001  SomeThirdName

我想要的输出是：

Type Year               
a    1990       SomeName
     1994       SomeName
     1996       SomeName
b    1992  SomeOtherName
     1997  SomeOtherName
c    2001  SomeThirdName

这需要在更大的DataFrame（数百万行）上完成，其中每个'Type'可以具有1-5个唯一的'Years'，并且名称值仅出现在最近一年。我试图避免为了性能而迭代行。

Answer 1

您可以按索引按降序对数据框进行排序，然后按margin-top: 56px...对其进行排序：

ffill

注意：示例数据实际上并不包含import pandas as pd df.sort_index(level = [0,1], ascending = False).ffill() # 0 # Type Year # c 2001 SomeThirdName # b 1997 SomeOtherName # 1992 SomeOtherName # a 1996 SomeName # 1994 SomeName # 1990 SomeName值，而是字符串np.nan，因此要使NaN生效，您需要替换ffill字符串为NaN：

np.nan

或者@ayhan建议，在更换String＆＃34; NaN＆＃34; import numpy as np df[0] = np.where(df[0] == "NaN", np.nan, df[0])使用np.nan。

根据MultiIndex Pandas填充NaN

1 个答案: