我有一个pandas数据框,我想填写一些NaN值。
import pandas as pd
tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index)
print(df)
0
Type Year
a 1990 NaN
1994 NaN
1996 SomeName
b 1992 NaN
1997 SomeOtherName
c 2001 SomeThirdName
我想要的输出是:
Type Year
a 1990 SomeName
1994 SomeName
1996 SomeName
b 1992 SomeOtherName
1997 SomeOtherName
c 2001 SomeThirdName
这需要在更大的DataFrame(数百万行)上完成,其中每个'Type'可以具有1-5个唯一的'Years',并且名称值仅出现在最近一年。我试图避免为了性能而迭代行。
答案 0 :(得分:2)
您可以按索引按降序对数据框进行排序,然后按margin-top: 56px...
对其进行排序:
ffill
注意:示例数据实际上并不包含import pandas as pd
df.sort_index(level = [0,1], ascending = False).ffill()
# 0
# Type Year
# c 2001 SomeThirdName
# b 1997 SomeOtherName
# 1992 SomeOtherName
# a 1996 SomeName
# 1994 SomeName
# 1990 SomeName
值,而是字符串np.nan
,因此要使NaN
生效,您需要替换ffill
字符串为NaN
:
np.nan
或者@ayhan建议,在更换String" NaN" import numpy as np
df[0] = np.where(df[0] == "NaN", np.nan, df[0])
使用np.nan
。