更新!
请注意在df和数据透视表中(取消堆叠后)将年份更改为int。这给我带来了一些麻烦:)
值的数据:
d = {'ID':[1,1,1,2,2,2],'Date':['01-01-2013','01-02-2013','01-03-2013','01-
01-2008','01-02-2008','01-03-2008'],'CUSIP':
['X1','X1','X1','X2','X2','X2'],'X':['bla','bla','bla','bla','bla','bla']}
df = pd.DataFrame(data=d)
我有一个数据框:
Identifier CUSIP X Date
0 1 X1 bla 2013-01-01
1 1 X1 bla 2013-01-02
2 1 X1 bla 2013-01-03
3 2 X2 bla 2008-01-01
4 2 X2 bla 2008-01-02
5 2 X2 bla 2008-01-03
和数据透视表:
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
我希望实现如下布局:
Identifier CUSIP X Date Values
0 1 X1 bla 2013-01-01 1
1 1 X1 bla 2013-01-02 1
2 1 X1 bla 2013-01-03 1
3 2 X2 bla 2008-01-01 2
4 2 X2 bla 2008-01-02 2
5 2 X2 bla 2008-01-03 2
答案 0 :(得分:2)
#if necessary
df['Date'] = pd.to_datetime(df['Date'])
df['year'] = df.Date.dt.year
df1 = df.join(df1.stack().rename('val'), on=['CUSIP', 'year'])
print (df1)
Identifier Date CUSIP X year val
0 1 2013-01-01 X1 bla 2013 1
1 1 2013-01-02 X1 bla 2013 1
2 1 2013-04-03 X1 bla 2013 1
3 2 2008-01-01 X2 bla 2008 2
4 2 2008-01-02 X2 bla 2008 2
5 2 2008-03-03 X2 bla 2008 2
替代解决方案:
df1 = df.join(df1.stack().rename('val'), on=[df['CUSIP'], df['Date'].dt.year])
print (df1)
Identifier Date CUSIP X val
0 1 2013-01-01 X1 bla 1
1 1 2013-01-02 X1 bla 1
2 1 2013-04-03 X1 bla 1
3 2 2008-01-01 X2 bla 2
4 2 2008-01-02 X2 bla 2
5 2 2008-03-03 X2 bla 2
我相信您可以transform
使用year
size
,mean
,sum
这样的功能:
df['Date'] = pd.to_datetime(df['Date'])
df['Vals'] = df.groupby(['CUSIP', df['Date'].dt.year])['X'].transform('size')
print (df)
Identifier Date CUSIP X Vals
0 1 2013-01-01 X1 bla 5
1 1 2013-01-02 X1 bla 5
2 1 2013-04-03 X1 bla 5
3 1 2013-04-04 X1 bla 5
4 1 2013-05-05 X1 bla 5
5 2 2008-01-01 X2 bla 4
6 2 2008-01-02 X2 bla 4
7 2 2008-03-03 X2 bla 4
8 2 2008-03-04 X2 bla 4
答案 1 :(得分:2)
我就是这样做的,它看起来很复杂但实际上并不多,我只是在解释这些步骤。
从这样的数据框开始:
Identifier CUSIP X Date
0 1 X1 bla 2013-01-01
1 1 X1 bla 2013-01-02
2 1 X1 bla 2013-01-03
3 2 X2 bla 2008-01-01
4 2 X2 bla 2008-01-02
5 2 X2 bla 2008-01-03
使用df['year'] = df.Date.dt.year
Identifier CUSIP X Date year
0 1 X1 bla 2013-01-01 2013
1 1 X1 bla 2013-01-02 2013
2 1 X1 bla 2013-01-03 2013
3 2 X2 bla 2008-01-01 2008
4 2 X2 bla 2008-01-02 2008
5 2 X2 bla 2008-01-03 2008
然后使用您的数据透视表和stack。 (如果使用数据透视表,了解堆栈/取消堆栈将极大地帮助您)
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
>>> piv.stack()
CUSIP
X1 2008 1
2009 1
2010 1
2011 1
2012 1
2013 1
X2 2008 2
2009 2
2010 2
2011 2
2012 2
2013 2
然后您需要通过CUSIP和年份reindex,以便值与数据帧的顺序相同。
>>> piv.stack().reindex(df[['CUSIP', 'year']])
CUSIP
X1 2013 1
2013 1
2013 1
X2 2008 2
2008 2
2008 2
dtype: int64
所有在一起:
>>> df['pivot_values'] = piv.stack().reindex(df[['CUSIP', 'year']]).values
>>> df
Identifier CUSIP X Date year pivot_values
0 1 X1 bla 2013-01-01 2013 1
1 1 X1 bla 2013-01-02 2013 1
2 1 X1 bla 2013-01-03 2013 1
3 2 X2 bla 2008-01-01 2008 2
4 2 X2 bla 2008-01-02 2008 2
5 2 X2 bla 2008-01-03 2008 2
答案 2 :(得分:2)
假设我的数据框为df
df
CUSIP Date ID X
0 X1 01-01-2013 1 bla
1 X1 01-02-2013 1 bla
2 X1 01-03-2013 1 bla
3 X2 01-01-2008 2 bla
4 X2 01-02-2008 2 bla
5 X2 01-03-2008 2 bla
数据透视表是pv
pv
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
解决方案
由于您的日期只是字符串,因此我会将其传递给pd.to_datetime
。我还要确保pv
列为整数
df.assign(
PV_Values=
pv.rename(columns=int).lookup(
df.CUSIP, pd.to_datetime(df.Date).dt.year
)
)
CUSIP Date ID X PV_Values
0 X1 01-01-2013 1 bla 1
1 X1 01-02-2013 1 bla 1
2 X1 01-03-2013 1 bla 1
3 X2 01-01-2008 2 bla 2
4 X2 01-02-2008 2 bla 2
5 X2 01-03-2008 2 bla 2
注意强>
如果pv
列已经int
且df.Date
已经datetime
,那么这只是:
df.assign(PV_Values=pv.lookup(df.CUSIP, df.Date.dt.year))
CUSIP Date ID X PV_Values
0 X1 01-01-2013 1 bla 1
1 X1 01-02-2013 1 bla 1
2 X1 01-03-2013 1 bla 1
3 X2 01-01-2008 2 bla 2
4 X2 01-02-2008 2 bla 2
5 X2 01-03-2008 2 bla 2