这是大小均为0.02的买卖订单。但是它们被分割成更少的部分,因此每个订单现在占据多于一行。我要合并属于同一顺序的行。
“时间”列为每个订单提供唯一的时间戳。因此,我们可以看到4个订单
订单1是第3、4行
顺序2是第5、6、7行
订单3是第8、9行
顺序4是第10、11、12行
原始DataFrame:
| 1| Time | Market | Type | Price | Amount | Total | Fee | Acc |
| 2|-----------|-----------|-------|----------|---------|-----------|----------|---------|
| 3| 17:59:31 | Market 1 | Buy | 1207.55 | 0.0198 | 13.07451 | 0.00989 | MXG 36 |
| 4| 17:59:31 | Market 1 | Buy | 1207.20 | 0.0002 | 0.013086 | 0.00005 | MXG 36 |
| 5| 15:42:12 | Market 1 | Sell | 1146.78 | 0.0100 | 3.073645 | 0.00232 | MXG 36 |
| 6| 15:42:12 | Market 1 | Sell | 1147.44 | 0.0058 | 8.005802 | 0.00746 | MXG 36 |
| 7| 15:42:12 | Market 1 | Sell | 1147.91 | 0.0042 | 2.000000 | 0.00993 | MXG 36 |
| 8| 12:05:45 | Market 1 | Buy | 1355.20 | 0.0077 | 7.433008 | 0.00050 | MXG 36 |
| 9| 12:05:45 | Market 1 | Buy | 1355.00 | 0.0123 | 5.833023 | 0.00755 | MXG 36 |
|10| 10:22:17 | Market 1 | Sell | 1002.07 | 0.0010 | 0.373225 | 0.00238 | MXG 36 |
|11| 10:22:17 | Market 1 | Sell | 1001.35 | 0.0055 | 10.00000 | 0.00003 | MXG 36 |
|12| 10:22:17 | Market 1 | Sell | 1001.20 | 0.0135 | 3.001038 | 0.00330 | MXG 36 |
所需的最终结果:
| 1| Time | Market | Type | Price | Amount | Total | Fee | Acc |
| 2|-----------|-----------|-------|-----------|---------|-----------|----------|---------|
| 3| 17:59:31 | Market 1 | Buy | avg price | 0.0200 | 13.087596 | 0.01039 | MXG 36 |
| 4| 15:42:12 | Market 1 | Sell | avg price | 0.0200 | 13.079447 | 0.01971 | MXG 36 |
| 5| 12:05:45 | Market 1 | Buy | avg price | 0.0200 | 13.266031 | 0.00805 | MXG 36 |
| 6| 10:22:17 | Market 1 | Sell | avg price | 0.0200 | 13.374263 | 0.00598 | MXG 36 |
所以这里所做的大致如下:
我最近得到的是:
df.pivot_table(index= 'Time', values = ['Amount', 'Total', 'Fee'], aggfunc = 'sum')
| 1| | Amount | Total | Fee |
| 2| Time | | | |
| 3|-----------|---------|-----------|-----------|
| 4| 17:59:31 | 0.0200 | 'correct' | 'correct' |
| 5| 15:42:12 | 0.0200 | 'correct' | 'correct' |
| 6| 12:05:45 | 0.0200 | 'correct' | 'correct' |
| 7| 10:22:17 | 0.0200 | 'correct' | 'correct' |
“正确的”单元格只是为了节省自己的时间(创建表非常耗时:P)。但是它们显示出预期的结果。但是所有其他列都丢失了,包括“类型”列,该列的数据包含需要遵循每个订单的“购买”或“出售”数据。
答案 0 :(得分:1)
IIUC,我将groupby
与agg
中的字典一起使用,像这样:
d_agg = {'Market':'first'
,'Type':'first'
,'Price':'mean'
,'Amount':'sum'
,'Total':'sum'
,'Fee':'sum'
,'Acc':'first'}
df.groupby('Time', sort=False)['Market','Type','Price','Amount','Total','Fee','Acc']\
.agg(d_agg).reset_index()
输出:
Time Market Type Price Amount Total Fee Acc
0 17:59:31 Market 1 Buy 1207.375000 0.02 13.087596 0.00994 MXG 36
1 15:42:12 Market 1 Sell 1147.376667 0.02 13.079447 0.01971 MXG 36
2 12:05:45 Market 1 Buy 1355.100000 0.02 13.266031 0.00805 MXG 36
3 10:22:17 Market 1 Sell 1001.540000 0.02 13.374263 0.00571 MXG 36
您可以将pivot_table
与字典一起使用,以定义如何进行这样的聚合:
d_agg = {'Price':'mean'
,'Amount':'sum'
,'Total':'sum'
,'Fee':'sum'}
df.pivot_table(index=['Time','Market','Type','Acc'],
values = ['Amount', 'Total', 'Fee','Price'],
aggfunc = d_agg)\
.reset_index()
输出:
Time Market Type Acc Amount Fee Price Total
0 10:22:17 Market 1 Sell MXG 36 0.02 0.00571 1001.540000 13.374263
1 12:05:45 Market 1 Buy MXG 36 0.02 0.00805 1355.100000 13.266031
2 15:42:12 Market 1 Sell MXG 36 0.02 0.01971 1147.376667 13.079447
3 17:59:31 Market 1 Buy MXG 36 0.02 0.00994 1207.375000 13.087596
答案 1 :(得分:0)
尝试在初始数据帧上合并数据透视表的结果:
df = pd.merge(df,
df.pivot_table(index= 'Time',
values = ['Amount', 'Total', 'Fee'],
aggfunc = 'sum'),
how = 'outer')
如果这不是您想要的(如标题所示,但您的问题有些困惑),也许您只是从数据透视表中的列计算平均加权价格。