GroupBy并在Python中将元素与组的分位数进行比较

时间:2018-05-02 17:45:26

标签: python pandas dataframe compare

我是Python和Pandas的新手。我有一个带有日期和温度值超过30年的CSV,为了我的目的,我计算每个groupBy Month的分位数。

input_file = pd.read_csv("DailyMax_lat-14_lon35.csv")
sQ = data.groupby(['Month']).quantile(0.5)['TmaxScaled']

print(sQ)
Month
1     297.977336
2     298.190348
3     298.433919
4     298.580322
5     298.221629
6     296.736598
7     296.463704
8     298.701436
9     302.380452
10    304.102163
11    303.562688
12    299.231298

现在,我希望将DataFrame拆分为两个DataFrame,每个月分别使用比计算的分位数更小的值和更大的值。

你能帮助我吗?

print(input_file) is 

   Year  Month  Day  TmaxScaled
0      1980      1    3  296.941457
1      1980      1    4  296.978455
2      1980      1    5  296.654368
3      1980      1    6  296.732218
4      1980      1    7  297.468730
5      1980      1    8  298.330566
6      1980      1    9  297.844157
7      1980      1   10  297.228007
8      1980      1   11  296.916066
9      1980      1   12  297.247884
10     1980      1   13  297.851888
11     1980      1   14  298.854523

2 个答案:

答案 0 :(得分:1)

您可能需要transform

df['New']=df.groupby(['Month'])['TmaxScaled'].transform(lambda x :x.quantile(0.5) )
df1,df2=df.loc[df.TmaxScaled>df.New],df.loc[df.TmaxScaled<=df.New]
df1
Out[43]: 
    Year  Month  Day  TmaxScaled         New
4   1980      1    7  297.468730  297.237946
5   1980      1    8  298.330566  297.237946
6   1980      1    9  297.844157  297.237946
9   1980      1   12  297.247884  297.237946
10  1980      1   13  297.851888  297.237946
11  1980      1   14  298.854523  297.237946
df2
Out[44]: 
   Year  Month  Day  TmaxScaled         New
0  1980      1    3  296.941457  297.237946
1  1980      1    4  296.978455  297.237946
2  1980      1    5  296.654368  297.237946
3  1980      1    6  296.732218  297.237946
7  1980      1   10  297.228007  297.237946
8  1980      1   11  296.916066  297.237946

答案 1 :(得分:1)

正如您已经完成的那样计算sQ

sQ = df.groupby(['Month']).quantile(0.5)['TmaxScaled'].item()

print(sQ)
297.2379455

现在,执行groupby并将df分组:

df_dict = {k : g for k, g in df.groupby(df.TmaxScaled > sQ)}

df_dict[True]返回一个DataFrame,其值超过了分位数,反之亦然,df_dict[False]

df_dict[True]
    Year  Month  Day  TmaxScaled
4   1980      1    7  297.468730
5   1980      1    8  298.330566
6   1980      1    9  297.844157
9   1980      1   12  297.247884
10  1980      1   13  297.851888
11  1980      1   14  298.854523

df_dict[False]    
   Year  Month  Day  TmaxScaled
0  1980      1    3  296.941457
1  1980      1    4  296.978455
2  1980      1    5  296.654368
3  1980      1    6  296.732218
7  1980      1   10  297.228007
8  1980      1   11  296.916066

请注意,这会保留组内的顺序。