假设我具有以下数据集:
Time Geography Sex Population
1990 Northern Ireland Male NA
1990 Northern Ireland Female NA
1990 Northern Ireland Total NA
1991 Northern Ireland Male NA
1991 Northern Ireland Female NA
1991 Northern Ireland Total NA
1992 Northern Ireland Male 792100
1992 Northern Ireland Female 831100
1992 Northern Ireland Total 1623300
1993 Northern Ireland Male 812100
1993 Northern Ireland Female 851100
1993 Northern Ireland Total 1663200
最后我想拥有以下内容:
Time Geography Sex Population
1990 Northern Ireland Male 792100
1990 Northern Ireland Female 831100
1990 Northern Ireland Total 1623300
1991 Northern Ireland Male 792100
1991 Northern Ireland Female 831100
1991 Northern Ireland Total 1623300
1992 Northern Ireland Male 792100
1992 Northern Ireland Female 831100
1992 Northern Ireland Total 1623300
1993 Northern Ireland Male 812100
1993 Northern Ireland Female 851100
1993 Northern Ireland Total 1663200
意味着基本上我想用没有NA的第一年的值来填充前几年的值。
我该怎么做?
答案 0 :(得分:3)
您可以尝试以下方法:
df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()
输出:
Time Geography Sex Population
0 1990 Northern Ireland Female 831100.0
1 1990 Northern Ireland Male 792100.0
2 1990 Northern Ireland Total 1623300.0
3 1991 Northern Ireland Female 831100.0
4 1991 Northern Ireland Male 792100.0
5 1991 Northern Ireland Total 1623300.0
6 1992 Northern Ireland Female 831100.0
7 1992 Northern Ireland Male 792100.0
8 1992 Northern Ireland Total 1623300.0
答案 1 :(得分:3)
您可以将pandas.DataFrame.sort_values
,pandas.DataFrame.fillna
与方法bfill
链接起来,然后在pandas.DataFrame.sort_index
上按顺序获得原始索引:
df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()
print(df)
Time Geography Sex Population
0 1990 Northern Ireland Male 792100.0
1 1990 Northern Ireland Female 831100.0
2 1990 Northern Ireland Total 1623300.0
3 1991 Northern Ireland Male 792100.0
4 1991 Northern Ireland Female 831100.0
5 1991 Northern Ireland Total 1623300.0
6 1992 Northern Ireland Male 792100.0
7 1992 Northern Ireland Female 831100.0
8 1992 Northern Ireland Total 1623300.0
答案 2 :(得分:1)
我将使用groupby
和bfill
和ffill
(我只是为了保护而添加ffill
和bfill
)
df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
Time Geography Sex Population
0 1990 NorthernIreland Male 792100.0
1 1990 NorthernIreland Female 831100.0
2 1990 NorthernIreland Total 1623300.0
3 1991 NorthernIreland Male 792100.0
4 1991 NorthernIreland Female 831100.0
5 1991 NorthernIreland Total 1623300.0
6 1992 NorthernIreland Male 792100.0
7 1992 NorthernIreland Female 831100.0
8 1992 NorthernIreland Total 1623300.0