假设我有一个带有DateTimeIndex的数据框,如下所示:
Date_TimeOpen High Low Close Volume
2018-01-22 11:05:00 948.00 948.10 947.95 948.10 9820.0
2018-01-22 11:06:00 948.10 949.60 948.05 949.30 33302.0
2018-01-22 11:07:00 949.25 949.85 949.20 949.85 20522.0
2018-03-27 09:15:00 907.20 908.80 905.00 908.15 126343.0
2018-03-27 09:16:00 908.20 909.20 906.55 906.60 38151.0
2018-03-29 09:30:00 908.90 910.45 908.80 910.15 46429.0
我只想选择每个唯一日期(丢弃时间)的第一行,以便获得如下输出:
Date_Time Open High Low Close Volume
2018-01-22 11:05:00 948.00 948.10 947.95 948.10 9820.0
2018-03-27 09:15:00 907.20 908.80 905.00 908.15 126343.0
2018-03-29 09:30:00 908.90 910.45 908.80 910.15 46429.0
我尝试使用loc
和iloc
,但确实有帮助。
任何帮助将不胜感激。
答案 0 :(得分:3)
您需要group by日期并获得每个组的第一个元素:
import pandas as pd
data = [['2018-01-22 11:05:00', 948.00, 948.10, 947.95, 948.10, 9820.0],
['2018-01-22 11:06:00', 948.10, 949.60, 948.05, 949.30, 33302.0],
['2018-01-22 11:07:00', 949.25, 949.85, 949.20, 949.85, 20522.0],
['2018-03-27 09:15:00', 907.20, 908.80, 905.00, 908.15, 126343.0],
['2018-03-27 09:16:00', 908.20, 909.20, 906.55, 906.60, 38151.0],
['2018-03-29 09:30:00', 908.90, 910.45, 908.80, 910.15, 46429.0]]
df = pd.DataFrame(data=data)
df = df.set_index([0])
df.columns = ['Open', 'High', 'Low', 'Close', 'Volume']
result = df.groupby(pd.to_datetime(df.index).date).head(1)
print(result)
输出
Open High Low Close Volume
0
2018-01-22 11:05:00 948.0 948.10 947.95 948.10 9820.0
2018-03-27 09:15:00 907.2 908.80 905.00 908.15 126343.0
2018-03-29 09:30:00 908.9 910.45 908.80 910.15 46429.0