我使用Pandas Dataframe解决了一个稍微独特的问题。我有两个数据帧:
df1
time, Date, Stock, StartTime, EndTime
2016-10-11 12:00:00 2016-10-11 ABC 12:00:00.243 13:06:34.232
2016-10-11 12:01:00 2016-10-11 ABC 12:02:00.243 13:04:34.232
2016-10-11 12:03:00 2016-10-11 XYZ 08:02:00.243 11:24:23.533
df2
time, Date, Stock, Price, Volume
2016-10-11 12:00:00 2016-10-11 ABC 10.0 100
2016-10-11 12:01:00 2016-10-11 ABC 10.1 300
...
2016-10-11 16:01:00 2016-10-11 ABC 10.4 600
2016-10-11 12:01:00 2016-10-11 XYZ 5.1 1500
...
2016-10-11 17:01:00 2016-10-11 XYZ 10.1 200
...
现在对于df1中的每一行,我想将它加入到日期和库存列上的df2,这样在df2中,我能够计算df1中StartTime和EndTime内所有行的加权价格。
非常感谢你的帮助。
答案 0 :(得分:0)
合并,分组和应用加权平均函数。
将数据迁移到代码,以便人们轻松加载。
df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'XYZ'}, 'EndTime': {0: '13:06:34.232', 1: '13:04:34.232', 2: '11:24:23.533'}, 'StartTime': {0: '12:00:00.243', 1: '12:02:00.243', 2: '08:02:00.243'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '12:03:00'}})
df2 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11'}, 'Volume': {0: 100, 1: 300, 2: 600, 3: 1500, 4: 200}, 'Price': {0: 10.0, 1: 10.1, 2: 10.4, 3: 5.0999999999999996, 4: 10.1}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'XYZ', 4: 'XYZ'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '16:01:00', 3: '12:01:00', 4: '17:01:00'}})
print df1
print df2
我假设你的数据框如下,问题有点不清楚,让我知道,我们可以修改这个例子,以便答案符合所需的问题,冗余日期及时字段我省略了:
Date EndTime StartTime Stock time
0 2016-10-11 13:06:34.232 12:00:00.243 ABC 12:00:00
1 2016-10-11 13:04:34.232 12:02:00.243 ABC 12:01:00
2 2016-10-11 11:24:23.533 08:02:00.243 XYZ 12:03:00
Date Price Stock Volume time
0 2016-10-11 10.0 ABC 100 12:00:00
1 2016-10-11 10.1 ABC 300 12:01:00
2 2016-10-11 10.4 ABC 600 16:01:00
3 2016-10-11 5.1 XYZ 1500 12:01:00
4 2016-10-11 10.1 XYZ 200 17:01:00
df_merged= df1.merge(df2, on=['Date','Stock']) # Merge
df_merged = df_merged[['StartTime','EndTime','Price','Volume','Stock']] #Filter Columns
Without Stock Partition:
print df_merged.groupby(['StartTime','EndTime']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))
With Stock Partition:
print df_merged.groupby(['StartTime','EndTime','Stock']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))
给出:
StartTime EndTime
08:02:00.243 11:24:23.533 5.688235
12:00:00.243 13:06:34.232 10.270000
12:02:00.243 13:04:34.232 10.270000
dtype: float64
StartTime EndTime Stock
08:02:00.243 11:24:23.533 XYZ 5.688235
12:00:00.243 13:06:34.232 ABC 10.270000
12:02:00.243 13:04:34.232 ABC 10.270000