我是Python和Pandas的新手。我有一个带有日期和温度值超过30年的CSV,为了我的目的,我计算每个groupBy Month的分位数。
input_file = pd.read_csv("DailyMax_lat-14_lon35.csv")
sQ = data.groupby(['Month']).quantile(0.5)['TmaxScaled']
print(sQ)
Month
1 297.977336
2 298.190348
3 298.433919
4 298.580322
5 298.221629
6 296.736598
7 296.463704
8 298.701436
9 302.380452
10 304.102163
11 303.562688
12 299.231298
现在,我希望将DataFrame拆分为两个DataFrame,每个月分别使用比计算的分位数更小的值和更大的值。
你能帮助我吗?
print(input_file) is
Year Month Day TmaxScaled
0 1980 1 3 296.941457
1 1980 1 4 296.978455
2 1980 1 5 296.654368
3 1980 1 6 296.732218
4 1980 1 7 297.468730
5 1980 1 8 298.330566
6 1980 1 9 297.844157
7 1980 1 10 297.228007
8 1980 1 11 296.916066
9 1980 1 12 297.247884
10 1980 1 13 297.851888
11 1980 1 14 298.854523
答案 0 :(得分:1)
您可能需要transform
df['New']=df.groupby(['Month'])['TmaxScaled'].transform(lambda x :x.quantile(0.5) )
df1,df2=df.loc[df.TmaxScaled>df.New],df.loc[df.TmaxScaled<=df.New]
df1
Out[43]:
Year Month Day TmaxScaled New
4 1980 1 7 297.468730 297.237946
5 1980 1 8 298.330566 297.237946
6 1980 1 9 297.844157 297.237946
9 1980 1 12 297.247884 297.237946
10 1980 1 13 297.851888 297.237946
11 1980 1 14 298.854523 297.237946
df2
Out[44]:
Year Month Day TmaxScaled New
0 1980 1 3 296.941457 297.237946
1 1980 1 4 296.978455 297.237946
2 1980 1 5 296.654368 297.237946
3 1980 1 6 296.732218 297.237946
7 1980 1 10 297.228007 297.237946
8 1980 1 11 296.916066 297.237946
答案 1 :(得分:1)
正如您已经完成的那样计算sQ
:
sQ = df.groupby(['Month']).quantile(0.5)['TmaxScaled'].item()
print(sQ)
297.2379455
现在,执行groupby
并将df
分组:
df_dict = {k : g for k, g in df.groupby(df.TmaxScaled > sQ)}
df_dict[True]
返回一个DataFrame,其值超过了分位数,反之亦然,df_dict[False]
:
df_dict[True]
Year Month Day TmaxScaled
4 1980 1 7 297.468730
5 1980 1 8 298.330566
6 1980 1 9 297.844157
9 1980 1 12 297.247884
10 1980 1 13 297.851888
11 1980 1 14 298.854523
df_dict[False]
Year Month Day TmaxScaled
0 1980 1 3 296.941457
1 1980 1 4 296.978455
2 1980 1 5 296.654368
3 1980 1 6 296.732218
7 1980 1 10 297.228007
8 1980 1 11 296.916066
请注意,这会保留组内的顺序。