我有一个这样的数据框:
index = [0,1,2,3,4,5]
s = pd.Series([1,1,1,2,2,2],index= index)
t = pd.Series([2007,2008,2011,2006,2007,2009],index= index)
f = pd.Series([2,4,6,8,10,12],index= index)
pp =pd.DataFrame(np.c_[s,t,f],columns = ["group","year","amount"])
pp
group year amount
0 1 2007 2
1 1 2008 4
2 1 2011 6
3 2 2006 8
4 2 2007 10
5 2 2009 12
我想在每组的缺失年份之间添加行。我的愿望数据帧是这样的:
group year amount
0 1.0 2007 2.0
1 1.0 2008 4.0
2 1.0 2009 NaN
3 1.0 2010 NaN
4 1.0 2011 6
5 1.0 2006 8.0
6 2.0 2007 10.0
7 2.0 2008 NaN
8 2.0 2009 12.0
有什么方法可以处理大型数据帧?
答案 0 :(得分:3)
第一个更改日期至日期时间:
df.year = pd.to_datetime(df.year, format='%Y')
set_index
和 resample
df.set_index('year').groupby('group').amount.resample('Y').mean().reset_index()
group year amount
0 1 2007-12-31 2.0
1 1 2008-12-31 4.0
2 1 2009-12-31 NaN
3 1 2010-12-31 NaN
4 1 2011-12-31 6.0
5 2 2006-12-31 8.0
6 2 2007-12-31 10.0
7 2 2008-12-31 NaN
8 2 2009-12-31 12.0