在分组数据帧上计算分位数的最快方法是什么?

时间:2019-08-22 11:40:53

标签: pandas numpy

我正在从熊猫数据框中创建每月的日图。我需要绘制均值,中位数或任何分位数。我能够正确实现它,但是对于大数据,分位数计算要比均值或中值计算慢得多。有没有更快的方法来实现这一目标?

import pandas as pd
import numpy as np
import datetime as dt

date_range = pd.date_range(start=dt.datetime(2018,1,1,00,00), end=dt.datetime(2018,12,31,23,59), freq='1min')
N = len(date_range)
df = pd.DataFrame({'Test': np.random.rand(N)}, index=date_range)
df['Time'] = df.index.time
df['Month'] = df.index.month

time_mean_median = dt.datetime(2019,1,1,0,0,0)
time_qunatiles = dt.datetime(2019,1,1,0,0,0)
for i in range(12):
    df_month = df[['Test', 'Time']].loc[df['Month'] == i + 1]
    start_time = dt.datetime.now()
    df1_group = df[['Test', 'Time']].groupby('Time').agg([np.mean, np.median])
    time_mean_median += dt.datetime.now()-start_time

    quantiles = [0.23, 0.72]
    start_time = dt.datetime.now()
    df2_group = df[['Test', 'Time']].groupby('Time').quantile(q=quantiles).unstack()
    time_qunatiles += dt.datetime.now() - start_time


print('Mean/median computation time {}'.format(time_mean_median.time()))
print('Quantile computation time {}'.format(time_qunatiles.time()))

在此示例中,我得到的平均/中值总计算时间约为0.7秒,而分位数计算则为将近12秒。

0 个答案:

没有答案