我有一个数据框:
Av_Temp Tot_Precip
278.001 0
274 0.0751864
270.294 0.631634
271.526 0.229285
272.246 0.0652201
273 0.0840059
270.463 0.0602944
269.983 0.103563
268.774 0.0694555
269.529 0.010908
270.062 0.043915
271.982 0.0295718
我想找到列的百分位值(25%,50%,75%):' Tot_Precip'对于列中的每个十分位数(前10%,下一个10%......):Av_Temp。目前,我这样做:
import numpy, pandas, pdb
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
num_samples = 10.0
max_val = df[expl_var].max()
min_val = df[expl_var].min()
expl_bins = numpy.linspace(min_val, max_val, num = num_samples)
for index, val in enumerate(expl_bins):
print index
if index < (len(expl_bins) - 1):
cur_val = val
nxt_val = expl_bins[index+1]
# Subset dataframe to rows with values of expl_var between
# cur_val and nxt_val
sub_ind_df = df[(df[expl_var] >= cur_val) & (df[expl_var] <= nxt_val)]
sub_ind_df[cname+'_quartiles'] = pandas.qcut(sub_ind_df[cname], 4)
# Merge with sub_df
pdb.set_trace()
在此之后不确定如何继续。
答案可能是:
Av_Temp_decile Tot_Precip_25 Tot_Precip_50 Tot_Precip_75
270 - 272 0.03 0.05 0.08
答案 0 :(得分:1)
由于小的示例数据集,我只是将数据分成两半而不是在这里分解,但是如果你只增加初始切割中的bin数量,那么一切都应该相同:
# Change this to 10 to get deciles
df['Temp_Halves'] = pd.qcut(df['Av_Temp'], 2)
def get_quartiles(group):
# Add retbins=True to get the bin edges
qs, bins = pd.qcut(group['Tot_Precip'], [.25, .5, .75], retbins=True)
# Returning a series from a function means groupby.apply() will
# expand it into separate columns
return pd.Series(bins, index=['Precip_25', 'Precip_50', 'Precip_75']
df.groupby('Temp_Halves').apply(get_quartiles)
Out[21]:
Precip_25 Precip_50 Precip_75
Temp_Halves
[268.774, 270.995] 0.048010 0.064875 0.095036
(270.995, 278.001] 0.038484 0.070203 0.081801