如何在DateTimeIndex

时间:2018-12-25 14:04:12

标签: python pandas datetime dataframe

假设我有一个带有DateTimeIndex的数据框,如下所示:

Date_TimeOpen   High    Low     Close   Volume  
2018-01-22 11:05:00 948.00  948.10  947.95  948.10  9820.0
2018-01-22 11:06:00 948.10  949.60  948.05  949.30  33302.0
2018-01-22 11:07:00 949.25  949.85  949.20  949.85  20522.0
2018-03-27 09:15:00 907.20  908.80  905.00  908.15  126343.0
2018-03-27 09:16:00 908.20  909.20  906.55  906.60  38151.0
2018-03-29 09:30:00 908.90  910.45  908.80  910.15  46429.0

我只想选择每个唯一日期(丢弃时间)的第一行,以便获得如下输出:

Date_Time   Open    High    Low     Close   Volume
2018-01-22 11:05:00 948.00  948.10  947.95  948.10  9820.0
2018-03-27 09:15:00 907.20  908.80  905.00  908.15  126343.0
2018-03-29 09:30:00 908.90  910.45  908.80  910.15  46429.0

我尝试使用lociloc,但确实有帮助。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

您需要group by日期并获得每个组的第一个元素:

import pandas as pd

data = [['2018-01-22 11:05:00', 948.00, 948.10, 947.95, 948.10, 9820.0],
        ['2018-01-22 11:06:00', 948.10, 949.60, 948.05, 949.30, 33302.0],
        ['2018-01-22 11:07:00', 949.25, 949.85, 949.20, 949.85, 20522.0],
        ['2018-03-27 09:15:00', 907.20, 908.80, 905.00, 908.15, 126343.0],
        ['2018-03-27 09:16:00', 908.20, 909.20, 906.55, 906.60, 38151.0],
        ['2018-03-29 09:30:00', 908.90, 910.45, 908.80, 910.15, 46429.0]]

df = pd.DataFrame(data=data)
df = df.set_index([0])
df.columns = ['Open', 'High', 'Low', 'Close', 'Volume']

result = df.groupby(pd.to_datetime(df.index).date).head(1)

print(result)

输出

                      Open    High     Low   Close    Volume
0                                                           
2018-01-22 11:05:00  948.0  948.10  947.95  948.10    9820.0
2018-03-27 09:15:00  907.2  908.80  905.00  908.15  126343.0
2018-03-29 09:30:00  908.9  910.45  908.80  910.15   46429.0