根据另一列查找pandas四分位数

时间:2015-09-06 20:34:34

标签: python pandas

我有一个数据框:

Av_Temp Tot_Precip
278.001 0
274     0.0751864
270.294 0.631634
271.526 0.229285
272.246 0.0652201
273     0.0840059
270.463 0.0602944
269.983 0.103563
268.774 0.0694555
269.529 0.010908
270.062 0.043915
271.982 0.0295718

我想找到列的百分位值(25%,50%,75%):' Tot_Precip'对于列中的每个十分位数(前10%,下一个10%......):Av_Temp。目前,我这样做:

import numpy, pandas, pdb
expl_var = 'Av_Temp'
cname    = 'Tot_Precip'
num_samples = 10.0
max_val = df[expl_var].max()
min_val = df[expl_var].min()

expl_bins = numpy.linspace(min_val, max_val, num = num_samples)

for index, val in enumerate(expl_bins):
    print index
    if index < (len(expl_bins) - 1):
        cur_val = val
        nxt_val = expl_bins[index+1]

        # Subset dataframe to rows with values of expl_var between
        # cur_val and nxt_val
        sub_ind_df = df[(df[expl_var] >= cur_val) & (df[expl_var] <= nxt_val)]

        sub_ind_df[cname+'_quartiles'] = pandas.qcut(sub_ind_df[cname], 4)
        # Merge with sub_df
        pdb.set_trace()

在此之后不确定如何继续。

答案可能是:

Av_Temp_decile     Tot_Precip_25      Tot_Precip_50    Tot_Precip_75
270 - 272           0.03                  0.05               0.08

1 个答案:

答案 0 :(得分:1)

由于小的示例数据集,我只是将数据分成两半而不是在这里分解,但是如果你只增加初始切割中的bin数量,那么一切都应该相同:

# Change this to 10 to get deciles
df['Temp_Halves'] = pd.qcut(df['Av_Temp'], 2)

def get_quartiles(group):
    # Add retbins=True to get the bin edges
    qs, bins = pd.qcut(group['Tot_Precip'], [.25, .5, .75], retbins=True)
    # Returning a series from a function means groupby.apply() will 
    #   expand it into separate columns
    return pd.Series(bins, index=['Precip_25', 'Precip_50', 'Precip_75']

df.groupby('Temp_Halves').apply(get_quartiles)
Out[21]: 
                    Precip_25  Precip_50  Precip_75
Temp_Halves                                        
[268.774, 270.995]   0.048010   0.064875   0.095036
(270.995, 278.001]   0.038484   0.070203   0.081801