我试图在pandas.dataframe对象上调用pivot_table函数。这是调用pivot_table函数之前的对象示例:
Bid Symb Ask DateTime
0 201000 ESU6 201050 2016-06-19 18:59:58.337134544-05:00
1 201025 ESU6 201075 2016-06-19 18:59:58.337134544-05:00
2 201000 ESU6 201025 2016-06-19 18:59:59.611987128-05:00
3 200975 ESU6 201025 2016-06-19 18:59:59.995825670-05:00
如您所见,DateTime列中存在重复值。我想调用pivot_table函数result = object.pivot_table(columns='Symb', values=['Bid','Ask','DateTime'], index=result.index)
,以便得到以下数据帧对象:
Bid Ask DateTime
Symb ESU6 ESU6 ESU6
0 201000 201050 2016-06-19 18:59:58.337134544-05:00
1 201025 201075 2016-06-19 18:59:58.337134544-05:00
2 201000 201025 2016-06-19 18:59:59.611987128-05:00
3 200975 201025 2016-06-19 18:59:59.995825670-05:00
但是,DateTime具有非数字值,因此不会合并到pivot_table函数的输出中。最后,我想将此作为最终结果:
Bid.ESU6 Ask.ESU6
DateTime
2016-06-19 18:59:58.337134544-05:00 201000 201050
2016-06-19 18:59:58.337134544-05:00 201025 201075
2016-06-19 18:59:59.611987120-05:00 201000 201025
2016-06-19 18:59:59.995825670-05:00 200975 201025
[注意:问题是pivot_table首先不允许非唯一索引(或者更确切地说,它会截断数据,使得索引是唯一的),所以我不能只调用result = object.pivot_table(columns='Symb', values=['Bid','Ask'], index=object['DateTime'])
。此外,如果我将整数作为索引,它将不允许我简单地调用result = object.pivot_table(columns='Symb', values=['Bid','Ask','DateTime'], index=object.index)
,因为DateTime列由非数字值组成,导致pivot_table结果只是排除DateTime列。另一种解决方法是将DateTime转换为表示日期时间的数值,然后转换回来,但这是资源昂贵的并且花费太多时间,因为我的数据帧对象有100,000多行。]
提前感谢您的帮助!
答案 0 :(得分:1)
您可以按照列index
的值进行旋转后设置新的DateTime
:
result = object.pivot_table(columns='Symb', values=['Bid','Ask'], index=object.index)
result.index = object.DateTime
#remove Multiindex in columns
result.columns = ['.'.join(col) for col in result.columns]
print (result)
Bid.ESU6 Ask.ESU6
DateTime
2016-06-19 18:59:58.337134544-05:00 201000 201050
2016-06-19 18:59:58.337134544-05:00 201025 201075
2016-06-19 18:59:59.611987128-05:00 201000 201025
2016-06-19 18:59:59.995825670-05:00 200975 201025