如何按日期应用函数的Pandas数据框的子集?

时间:2015-11-26 02:07:20

标签: python pandas

这是我数据的简化版本:

Date and Time           Price   Volume
2015-01-01 17:00:00.211 2030.25 342
2015-01-01 17:00:02.456 2030.75 725
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:00.074 2031.00 101
2015-01-02 17:00:16.221 2031.75 245
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
2015-01-03 17:00:25.098 2031.75 849
2015-01-03 17:00:45.188 2031.75 549

我希望数据框的子集选择“音量”每天列的最小值,以及相应的“日期和时间”和“价格”。输出将是:

Date and Time           Price   Volume
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180

由于

2 个答案:

答案 0 :(得分:2)

最简单的方法是将DateTime分成两个单独的列。正如您在帖子中所说,您需要" min of the column' Volume'每一天"。

    Date        Time            Price    Volume
0   2015-01-01  17:00:00.211    2030.25     342
1   2015-01-01  17:00:02.456    2030.75     725
2   2015-01-01  17:00:02.666    2030.75     203
3   2015-01-02  17:00:00.074    2031.00     101
4   2015-01-02  17:00:16.221    2031.75     245
5   2015-01-02  17:00:25.882    2031.75     100
6   2015-01-03  17:00:00.054    2031.00     180
7   2015-01-03  17:00:25.098    2031.75     849
8   2015-01-03  17:00:45.188    2031.75     549

df = df.groupby('Date')['Volume'].min()
print (df)

输出是每天Volume列的最小值。

Date
2015-01-01    203
2015-01-02    100
2015-01-03    180
Name: Volume, dtype: object

编辑:如果您还想获取原始数据框架的索引(相应的时间和价格),您可以这样做:

idx = df.groupby(['Date'])['Volume'].transform(min) == df['Volume']
df[idx]

在这种情况下输出:

    Date        Time            Price    Volume
2   2015-01-01  17:00:02.666    2030.75     203
5   2015-01-02  17:00:25.882    2031.75     100
6   2015-01-03  17:00:00.054    2031.00     180

答案 1 :(得分:1)

按天分组行,然后每天获取具有最小音量的行:

from pandas import DatetimeIndex, DataFrame

df = DataFrame(...)
times = DatetimeIndex(df['Date and Time'])
grouped = df.groupby([times.day])

# takes DataFrame as input; returns the DataFrame row with the lowest 'Volume'
find_min = lambda cur_df: cur_df.ix[cur_df['Volume'].idxmin()]
# assemble a DataFrame from Series objects
result = DataFrame([find_min(x[1]) for x in grouped])
result = result.reset_index(drop=True)   # optional re-indexing

print result    

输出:

             Date and Time    Price Volume
0  2015-01-01 17:00:02.666  2030.75    203
1  2015-01-02 17:00:25.882  2031.75    100
2  2015-01-03 17:00:00.054  2031.00    180