熊猫数据框中的最大值和最小值

时间:2020-02-11 15:04:39

标签: python

我有一个熊猫数据框,其中显示了1990年的每小时温度读数,如下所示:

           Date and time  Dry bulb temperature
0    1990-01-01 00:00:00                   8.2
1    1990-01-01 01:00:00                   8.1
2    1990-01-01 02:00:00                   8.3
3    1990-01-01 03:00:00                   8.5
4    1990-01-01 04:00:00                   8.8
...                  ...                   ...
8755 1990-12-31 19:00:00                   3.0
8756 1990-12-31 20:00:00                   2.6
8757 1990-12-31 21:00:00                   2.8
8758 1990-12-31 22:00:00                   4.2
8759 1990-12-31 23:00:00                   2.0

我想每24小时计算一次干球最高温度,并获得相应的日期和时间。我将如何处理?

到目前为止,我有:

o=[]
for i in range(0, len(Dataframe['Dry bulb temperature']), 24):
    ymax = np.max(Dataframe['Dry bulb temperature'][i:i+24])
    o.append(ymax)
print(o)

每24小时给出最高温度,如下所示:

[9.7, 9.9, 8.4, 10.4, 11.2, 12.0, 10.5, 10.7, 11.9, 12.0, 11.5, 11.4, 10.2, 10.9, 13.6, 11.5, 9.6, 10.9, 10.8, 12.3, 12.3, 12.2, 11.5, 7.9, 12.7, 6.0, 9.4, 8.2, 9.8, 10.6, 9.6, 8.8, 10.8, 8.6, 11.9, 11.7, 12.2, 13.8, 12.5, 10.8, 13.2, 8.2, 7.4, 12.1, 12.4, 8.6, 7.7, 12.3, 13.3, 12.3, 13.1, 12.0, 12.7, 11.5, 12.7, 12.5, 12.5, 8.7, 13.2, 7.7, 9.0, 10.1, 10.6, 10.9, 11.9, 11.4, 13.3, 12.2, 15.0, 14.1, 13.1, 12.9, 13.7, 12.7, 12.7, 16.3, 14.9, 12.8, 11.8, 14.2, 11.5, 11.7, 10.4, 10.1, 9.9, 9.6, 10.6, 12.7, 16.0, 15.3, 14.4, 14.2, 8.6, 7.0, 9.8, 11.6, 12.6, 11.1, 12.3, 12.2, 14.8, 15.2, 11.3, 12.1, 12.0, 12.3, 11.5, 10.8, 10.0, 11.7, 15.3, 12.9, 17.0, 17.6, 18.9, 14.2, 13.3, 14.9, 17.8, 20.6, 21.9, 24.1, 26.8, 25.4, 24.9, 23.5, 16.4, 14.9, 13.8, 14.2, 17.7, 17.9, 16.8, 15.7, 16.3, 18.9, 19.4, 18.3, 14.5, 17.6, 18.8, 18.1, 21.9, 18.2, 14.7, 14.9, 19.4, 20.0, 14.9, 18.9, 16.8, 17.6, 15.8, 14.6, 17.0, 15.6, 16.4, 15.0, 13.9, 18.5, 22.7, 16.4, 16.8, 15.6, 16.7, 19.0, 19.0, 17.2, 17.6, 18.7, 17.4, 15.5, 18.2, 17.8, 18.5, 21.9, 19.7, 21.2, 16.6, 17.3, 16.5, 16.3, 17.2, 18.5, 18.1, 17.3, 16.9, 21.3, 22.6, 17.5, 18.9, 21.9, 26.2, 26.5, 24.7, 25.3, 24.2, 23.3, 22.6, 23.1, 27.6, 30.2, 27.2, 22.1, 19.7, 22.6, 21.1, 23.8, 24.7, 22.1, 22.4, 23.7, 26.9, 29.2, 32.3, 30.0, 21.4, 22.2, 22.0, 23.0, 21.2, 22.6, 23.4, 24.9, 22.6, 19.7, 21.1, 18.9, 18.6, 22.0, 22.2, 19.4, 20.5, 24.8, 24.1, 27.0, 24.8, 25.1, 21.2, 22.6, 20.1, 18.3, 18.8, 20.6, 25.6, 22.1, 18.8, 17.7, 16.7, 18.4, 17.9, 20.2, 21.8, 20.6, 20.5, 21.0, 21.3, 19.6, 18.1, 17.4, 18.8, 16.0, 15.8, 15.9, 16.0, 14.4, 15.3, 16.4, 18.3, 17.3, 18.8, 17.3, 19.2, 16.0, 16.9, 16.4, 15.7, 19.7, 16.5, 14.0, 14.5, 14.7, 17.7, 15.2, 19.8, 18.6, 17.8, 18.0, 16.2, 16.7, 17.1, 17.7, 16.6, 16.1, 13.3, 16.3, 14.8, 14.8, 12.5, 12.8, 13.6, 10.2, 14.0, 12.9, 11.4, 10.7, 10.3, 10.4, 8.7, 9.7, 10.4, 11.0, 13.4, 13.9, 12.9, 16.3, 16.2, 13.1, 14.1, 15.8, 15.3, 12.0, 11.9, 9.7, 9.1, 6.7, 8.8, 7.4, 5.4, 7.9, 7.3, 6.3, 7.6, 8.1, 7.3, 6.6, 9.0, 10.0, 7.4, 4.7, 9.6, 4.0, 3.3, 7.0, 9.7, 10.1, 5.4, 3.4, 3.7, 5.0, 2.3, 3.6, 6.9, 9.4, 12.1, 11.4, 10.1, 10.2, 9.7, 13.7, 7.3, 11.5, 9.4, 9.6, 9.0]

我想以以下形式获取每个最高温度的相应日期:

[9.7,1990-01-02 03:00:00],...,etc. 

2 个答案:

答案 0 :(得分:0)

您可以使用此:

df['Date and time'] = pd.to_datetime(df['Date and time'])
df1 = df.set_index('Date and time').resample('D')['Dry bulb temperature'].agg({'max':'max', 'min':'min'})

它为您的问题中的可见数据提供以下输出:

               max  min
Date and time          
1990-01-01     8.8  8.1
1990-12-31     4.2  2.0

如果您确实希望将结果作为列表,则可以在以后使用它:

df1.reset_index().to_numpy()
[array([Timestamp('1990-01-01 00:00:00'), 8.8, 8.1], dtype=object),
 array([Timestamp('1990-12-31 00:00:00'), 4.2, 2.0], dtype=object)]

要获取每天最大值的确切日期时间,可以尝试以下操作:

df2 = df.set_index('Date and time')
df2.loc[df2.groupby(df2.index.dayofyear).idxmax().iloc[:, 0]]

                     Dry_bulb_temperature
Date_and_time                            
1990-01-01 04:00:00                   8.8
1990-12-31 22:00:00                   4.2

答案 1 :(得分:0)

您可以尝试使用此功能:

from datetime import timedelta

day = min(df['Date and time'])
max_day = max(df['Date and time'])

results = list()
while day <= max_day:
    # small part of dataframe
    temp = df[(df['Date and time'] >= day) & (df['Date and time'] < day + timedelta(1))]
    # Row with max temprature
    row = df.iloc[temp['Dry bulb temperature'].idxmax()]
    results.append([row['Dry bulb temperature'], row['Date and time']])
    day += timedelta(1)