Pandas Dataframe VWAP计算自定义持续时间

时间:2016-10-12 22:48:33

标签: python pandas

我使用Pandas Dataframe解决了一个稍微独特的问题。我有两个数据帧:

df1
time,                  Date,         Stock,  StartTime,      EndTime
2016-10-11 12:00:00    2016-10-11    ABC     12:00:00.243    13:06:34.232
2016-10-11 12:01:00    2016-10-11    ABC     12:02:00.243    13:04:34.232
2016-10-11 12:03:00    2016-10-11    XYZ     08:02:00.243    11:24:23.533

df2
time,                  Date,         Stock,  Price, Volume
2016-10-11 12:00:00    2016-10-11    ABC     10.0    100
2016-10-11 12:01:00    2016-10-11    ABC     10.1    300
...
2016-10-11 16:01:00    2016-10-11    ABC     10.4    600
2016-10-11 12:01:00    2016-10-11    XYZ     5.1    1500
...
2016-10-11 17:01:00    2016-10-11    XYZ     10.1    200
...

现在对于df1中的每一行,我想将它加入到日期和库存列上的df2,这样在df2中,我能够计算df1中StartTime和EndTime内所有行的加权价格。

非常感谢你的帮助。

1 个答案:

答案 0 :(得分:0)

合并,分组和应用加权平均函数。

将数据迁移到代码,以便人们轻松加载。

df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'XYZ'}, 'EndTime': {0: '13:06:34.232', 1: '13:04:34.232', 2: '11:24:23.533'}, 'StartTime': {0: '12:00:00.243', 1: '12:02:00.243', 2: '08:02:00.243'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '12:03:00'}})
df2 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11'}, 'Volume': {0: 100, 1: 300, 2: 600, 3: 1500, 4: 200}, 'Price': {0: 10.0, 1: 10.1, 2: 10.4, 3: 5.0999999999999996, 4: 10.1}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'XYZ', 4: 'XYZ'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '16:01:00', 3: '12:01:00', 4: '17:01:00'}})

print df1
print df2

我假设你的数据框如下,问题有点不清楚,让我知道,我们可以修改这个例子,以便答案符合所需的问题,冗余日期及时字段我省略了:

         Date       EndTime     StartTime Stock      time
0  2016-10-11  13:06:34.232  12:00:00.243   ABC  12:00:00
1  2016-10-11  13:04:34.232  12:02:00.243   ABC  12:01:00
2  2016-10-11  11:24:23.533  08:02:00.243   XYZ  12:03:00

         Date  Price Stock  Volume      time
0  2016-10-11   10.0   ABC     100  12:00:00
1  2016-10-11   10.1   ABC     300  12:01:00
2  2016-10-11   10.4   ABC     600  16:01:00
3  2016-10-11    5.1   XYZ    1500  12:01:00
4  2016-10-11   10.1   XYZ     200  17:01:00



df_merged= df1.merge(df2, on=['Date','Stock']) # Merge 
df_merged =  df_merged[['StartTime','EndTime','Price','Volume','Stock']] #Filter Columns

Without Stock Partition:
print df_merged.groupby(['StartTime','EndTime']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))

With Stock Partition:
print df_merged.groupby(['StartTime','EndTime','Stock']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))

给出:

StartTime     EndTime     
08:02:00.243  11:24:23.533     5.688235
12:00:00.243  13:06:34.232    10.270000
12:02:00.243  13:04:34.232    10.270000
dtype: float64


StartTime     EndTime       Stock
08:02:00.243  11:24:23.533  XYZ       5.688235
12:00:00.243  13:06:34.232  ABC      10.270000
12:02:00.243  13:04:34.232  ABC      10.270000