我有两个pandas数据帧,一个存储值,另一个存储值数据帧的权重键:[Symbol,Date,Hour],权重数据帧为[Symbol,Date]。
In [8]: value_df = pd.DataFrame({'Symbol':['S1','S1','S1','S1','S2','S2','S3'], 'Date' : [20150101,20150101, 20150101, 20150102,20150101,20150102,20150103], 'Hour' : [8,9,10,8,8,8,8], 'value' : [10,10.1,10.2,11,100,101,300]}) In [9]: value_df Out[9]: Date Hour Symbol value 0 20150101 8 S1 10.0 1 20150101 9 S1 10.1 2 20150101 10 S1 10.2 3 20150102 8 S1 11.0 4 20150101 8 S2 100.0 5 20150102 8 S2 101.0 6 20150103 8 S3 300.0 In [10]: weight_df = pd.DataFrame({'Symbol': ['S1','S1','S1','S2','S2','S2','S3','S3','S3'], 'Date':[20150101,20150102,20150103] * 3,'Weight': [0.8,0.9,1,1,1,1,0.5,0.5,0.5]}) In [11]: weight_df Out[11]: Date Symbol Weight 0 20150101 S1 0.8 1 20150102 S1 0.9 2 20150103 S1 1.0 3 20150101 S2 1.0 4 20150102 S2 1.0 5 20150103 S2 1.0 6 20150101 S3 0.5 7 20150102 S3 0.5 8 20150103 S3 0.5
我想合并这两个表,并在value_df中添加权重列它应该是一个笛卡尔积。对于实例:
Date Hour Symbol value weight 0 20150101 8 S1 10.0 0.8 1 20150101 9 S1 10.1 0.8 2 20150101 10 S1 10.2 0.8 3 20150102 8 S1 11.0 0.9 4 20150101 8 S2 100.0 1.0 5 20150102 8 S2 101.0 1.0 6 20150103 8 S3 300.0 0.5
这里的挑战是因为额外的'小时'柱。
答案 0 :(得分:2)
我不确定我是否理解“挑战”。简单的合并已经提供了您想要的输出:
>>> pandas.merge(value_df, weight_df, on=['Date', 'Symbol'])
Date Hour Symbol value Weight
0 20150101 8 S1 10.0 0.8
1 20150101 9 S1 10.1 0.8
2 20150101 10 S1 10.2 0.8
3 20150102 8 S1 11.0 0.9
4 20150101 8 S2 100.0 1.0
5 20150102 8 S2 101.0 1.0
6 20150103 8 S3 300.0 0.5