填写每组行的数据框值

时间:2019-03-24 16:41:02

标签: python pandas

假设我具有以下数据集:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    NA
1990    Northern Ireland    Female  NA
1990    Northern Ireland    Total   NA
1991    Northern Ireland    Male    NA
1991    Northern Ireland    Female  NA
1991    Northern Ireland    Total   NA
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

最后我想拥有以下内容:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    792100
1990    Northern Ireland    Female  831100
1990    Northern Ireland    Total   1623300
1991    Northern Ireland    Male    792100
1991    Northern Ireland    Female  831100
1991    Northern Ireland    Total   1623300
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

意味着基本上我想用没有NA的第一年的值来填充前几年的值。

我该怎么做?

3 个答案:

答案 0 :(得分:3)

您可以尝试以下方法:

df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()

输出:

   Time         Geography     Sex  Population
0  1990  Northern Ireland  Female    831100.0
1  1990  Northern Ireland    Male    792100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland  Female    831100.0
4  1991  Northern Ireland    Male    792100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland  Female    831100.0
7  1992  Northern Ireland    Male    792100.0
8  1992  Northern Ireland   Total   1623300.0

答案 1 :(得分:3)

您可以将pandas.DataFrame.sort_valuespandas.DataFrame.fillna与方法bfill链接起来,然后在pandas.DataFrame.sort_index上按顺序获得原始索引:

df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()

print(df)
   Time         Geography     Sex  Population
0  1990  Northern Ireland    Male    792100.0
1  1990  Northern Ireland  Female    831100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland    Male    792100.0
4  1991  Northern Ireland  Female    831100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland    Male    792100.0
7  1992  Northern Ireland  Female    831100.0
8  1992  Northern Ireland   Total   1623300.0

答案 2 :(得分:1)

我将使用groupbybfillffill(我只是为了保护而添加ffillbfill

df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
   Time        Geography     Sex  Population
0  1990  NorthernIreland    Male    792100.0
1  1990  NorthernIreland  Female    831100.0
2  1990  NorthernIreland   Total   1623300.0
3  1991  NorthernIreland    Male    792100.0
4  1991  NorthernIreland  Female    831100.0
5  1991  NorthernIreland   Total   1623300.0
6  1992  NorthernIreland    Male    792100.0
7  1992  NorthernIreland  Female    831100.0
8  1992  NorthernIreland   Total   1623300.0