这是我数据的简化版本:
Date and Time Price Volume
2015-01-01 17:00:00.211 2030.25 342
2015-01-01 17:00:02.456 2030.75 725
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:00.074 2031.00 101
2015-01-02 17:00:16.221 2031.75 245
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
2015-01-03 17:00:25.098 2031.75 849
2015-01-03 17:00:45.188 2031.75 549
我希望数据框的子集选择“音量”每天列的最小值,以及相应的“日期和时间”和“价格”。输出将是:
Date and Time Price Volume
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
由于
答案 0 :(得分:2)
最简单的方法是将Date
和Time
分成两个单独的列。正如您在帖子中所说,您需要" min of the column' Volume'每一天"。
Date Time Price Volume
0 2015-01-01 17:00:00.211 2030.25 342
1 2015-01-01 17:00:02.456 2030.75 725
2 2015-01-01 17:00:02.666 2030.75 203
3 2015-01-02 17:00:00.074 2031.00 101
4 2015-01-02 17:00:16.221 2031.75 245
5 2015-01-02 17:00:25.882 2031.75 100
6 2015-01-03 17:00:00.054 2031.00 180
7 2015-01-03 17:00:25.098 2031.75 849
8 2015-01-03 17:00:45.188 2031.75 549
df = df.groupby('Date')['Volume'].min()
print (df)
输出是每天Volume
列的最小值。
Date
2015-01-01 203
2015-01-02 100
2015-01-03 180
Name: Volume, dtype: object
编辑:如果您还想获取原始数据框架的索引(相应的时间和价格),您可以这样做:
idx = df.groupby(['Date'])['Volume'].transform(min) == df['Volume']
df[idx]
在这种情况下输出:
Date Time Price Volume
2 2015-01-01 17:00:02.666 2030.75 203
5 2015-01-02 17:00:25.882 2031.75 100
6 2015-01-03 17:00:00.054 2031.00 180
答案 1 :(得分:1)
按天分组行,然后每天获取具有最小音量的行:
from pandas import DatetimeIndex, DataFrame
df = DataFrame(...)
times = DatetimeIndex(df['Date and Time'])
grouped = df.groupby([times.day])
# takes DataFrame as input; returns the DataFrame row with the lowest 'Volume'
find_min = lambda cur_df: cur_df.ix[cur_df['Volume'].idxmin()]
# assemble a DataFrame from Series objects
result = DataFrame([find_min(x[1]) for x in grouped])
result = result.reset_index(drop=True) # optional re-indexing
print result
输出:
Date and Time Price Volume
0 2015-01-01 17:00:02.666 2030.75 203
1 2015-01-02 17:00:25.882 2031.75 100
2 2015-01-03 17:00:00.054 2031.00 180