我有一个结构如下的数据集:
"Date","Time","Open","High","Low","Close","Up","Down","Volume"
01/03/2000,00:05,1481.50,1481.50,1481.00,1481.00,2,0,0.00
01/03/2000,00:10,1480.75,1480.75,1480.75,1480.75,1,0,1.00
01/03/2000,00:20,1480.50,1480.50,1480.50,1480.50,1,0,1.00
[...]
03/01/2018,11:05,2717.25,2718.00,2708.50,2709.25,9935,15371,25306.00
03/01/2018,11:10,2709.25,2711.75,2706.50,2709.50,8388,8234,16622.00
03/01/2018,11:15,2709.25,2711.50,2708.25,2709.50,4738,4703,9441.00
03/01/2018,11:20,2709.25,2709.50,2706.00,2707.25,3609,4685,8294.00
我以这种方式读取该文件:
rows = pd.read_csv("Datasets/myfile.txt")
我想通过大熊猫获取此信息:对于每一天(按日分组),分别获取“ Open”的第一个值,“ Close”的最后一个值,“ High”的最高值和“ Low”的较低值”和“体积总和”。
我知道如何处理一些冰柱,但这是一种非常低效的方法。与熊猫玩几行有可能吗?
谢谢
答案 0 :(得分:2)
使用groupby
和agg
:
df.groupby('Date').agg({
'Close': 'last',
'Open': 'first',
'High': 'max',
'Low': 'min',
'Volume': 'sum'
})
输出:
Close Open High Low Volume
Date
01/03/2000 1480.50 1481.50 1481.5 1480.5 2.0
03/01/2018 2707.25 2717.25 2718.0 2706.0 59663.0