按百分比级别对数据框进行分组

时间:2019-11-02 09:43:53

标签: pandas dataframe pandas-groupby quantile

我有一个数据框:

(“ U”,“ OLHC”,“ +”)计数:127

Date                         Open     High      Low    Close Sign Struct Trend      OH      HL      LC      OL      LH     HC
1997-06-17 00:00:00+00:00   812.97   897.60   811.80   894.42    +   OLHC     U   84.63   85.80   82.62    1.17   85.80   3.18
1998-03-08 00:00:00+00:00   957.59  1055.69   954.24  1055.69    +   OLHC     U   98.10  101.45  101.45    3.35  101.45   0.00
1998-10-14 00:00:00+00:00   957.28  1066.11   923.32  1005.53    +   OLHC     U  108.83  142.79   82.21   33.96  142.79  60.58
1998-11-27 00:00:00+00:00  1005.53  1192.97  1000.12  1192.33    +   OLHC     U  187.44  192.85  192.21    5.41  192.85   0.64
1999-01-10 00:00:00+00:00  1192.33  1278.24  1136.89  1275.09    +   OLHC     U   85.91  141.35  138.20   55.44  141.35   3.15
1999-04-08 00:00:00+00:00  1271.18  1344.08  1216.03  1343.98    +   OLHC     U   72.90  128.05  127.95   55.15  128.05   0.10
1999-11-14 00:00:00+00:00  1282.81  1396.12  1233.70  1396.06    +   OLHC     U  113.31  162.42  162.36   49.11  162.42   0.06
2001-04-25 00:00:00+00:00  1182.91  1253.76  1081.19  1228.75    +   OLHC     U   70.85  172.57  147.56  101.72  172.57  25.01
2001-12-01 00:00:00+00:00  1066.98  1163.38  1052.83  1137.88    +   OLHC     U   96.40  110.55   85.05   14.15  110.55  25.50
2003-03-30 00:00:00+00:00   836.25   895.78   788.90   863.50    +   OLHC     U   59.53  106.88   74.60   47.35  106.88  32.28
2003-05-13 00:00:00+00:00   863.50   947.51   843.68   942.30    +   OLHC     U   84.01  103.83   98.62   19.82  103.83   5.21
2003-09-22 00:00:00+00:00   977.59  1040.18   974.21  1022.82    +   OLHC     U   62.59   65.97   48.61    3.38   65.97  17.36
2003-11-05 00:00:00+00:00  1022.82  1061.44   990.34  1051.81    +   OLHC     U   38.62   71.10   61.47   32.48   71.10   9.63
2003-12-19 00:00:00+00:00  1053.14  1091.03  1031.24  1088.67    +   OLHC     U   37.89   59.79   57.43   21.90   59.79   2.36
2004-12-05 00:00:00+00:00  1095.74  1197.11  1090.23  1191.17    +   OLHC     U  101.37  106.88  100.94    5.51  106.88   5.94
2005-01-18 00:00:00+00:00  1190.84  1217.90  1173.76  1195.98    +   OLHC     U   27.06   44.14   22.22   17.08   44.14  21.92
2005-05-30 00:00:00+00:00  1142.40  1199.56  1136.22  1198.78    +   OLHC     U   57.16   63.34   62.56    6.18   63.34   0.78
2006-02-18 00:00:00+00:00  1274.61  1294.90  1253.61  1287.24    +   OLHC     U   20.29   41.29   33.63   21.00   41.29   7.66
2006-04-03 00:00:00+00:00  1287.14  1310.88  1268.42  1297.81    +   OLHC     U   23.74   42.46   29.39   18.72   42.46  13.07
2006-09-26 00:00:00+00:00  1267.60  1336.60  1266.67  1336.34    +   OLHC     U   69.00   69.93   69.67    0.93   69.93   0.26
2006-11-09 00:00:00+00:00  1335.37  1389.45  1327.10  1378.33    +   OLHC     U   54.08   62.35   51.23    8.27   62.35  11.12
2006-12-23 00:00:00+00:00  1378.35  1431.81  1375.60  1410.76    +   OLHC     U   53.46   56.21   35.16    2.75   56.21  21.05
2007-09-13 00:00:00+00:00  1455.27  1503.41  1370.60  1483.95    +   OLHC     U   48.14  132.81  113.35   84.67  132.81  19.46
2008-04-20 00:00:00+00:00  1293.37  1395.90  1256.98  1390.33    +   OLHC     U  102.53  138.92  133.35   36.39  138.92   5.57
2009-04-07 00:00:00+00:00   770.05   845.61   666.79   815.55    +   OLHC     U   75.56  178.82  148.76  103.26  178.82  30.06
2009-05-21 00:00:00+00:00   815.55   930.17   814.84   888.33    +   OLHC     U  114.62  115.33   73.49    0.71  115.33  41.84
2009-07-04 00:00:00+00:00   888.33   956.23   881.46   896.42    +   OLHC     U   67.90   74.77   14.96    6.87   74.77  59.81
2009-08-17 00:00:00+00:00   896.42  1018.00   869.32   979.73    +   OLHC     U  121.58  148.68  110.41   27.10  148.68  38.27
2009-09-30 00:00:00+00:00   979.73  1080.15   979.73  1057.08    +   OLHC     U  100.42  100.42   77.35    0.00  100.42  23.07
2009-11-13 00:00:00+00:00  1057.08  1105.37  1020.18  1093.48    +   OLHC     U   48.29   85.19   73.30   36.90   85.19  11.89
2010-10-31 00:00:00+00:00  1126.57  1196.14  1122.79  1183.26    +   OLHC     U   69.57   73.35   60.47    3.78   73.35  12.88
2010-12-14 00:00:00+00:00  1185.71  1246.73  1173.00  1241.59    +   OLHC     U   61.02   73.73   68.59   12.71   73.73   5.14
2011-01-27 00:00:00+00:00  1241.58  1301.29  1232.85  1299.54    +   OLHC     U   59.71   68.44   66.69    8.73   68.44   1.75
2011-03-12 00:00:00+00:00  1299.63  1344.07  1275.10  1304.28    +   OLHC     U   44.44   68.97   29.18   24.53   68.97  39.79
2011-12-01 00:00:00+00:00  1223.46  1292.66  1158.66  1244.58    +   OLHC     U   69.20  134.00   85.92   64.80  134.00  48.08
2012-01-14 00:00:00+00:00  1246.03  1296.82  1202.37  1289.09    +   OLHC     U   50.79   94.45   86.72   43.66   94.45   7.73
2012-02-27 00:00:00+00:00  1290.22  1371.94  1290.22  1367.59    +   OLHC     U   81.72   81.72   77.37    0.00   81.72   4.35
2012-07-08 00:00:00+00:00  1318.90  1374.81  1266.74  1354.68    +   OLHC     U   55.91  108.07   87.94   52.16  108.07  20.13
2012-08-21 00:00:00+00:00  1354.66  1426.68  1325.41  1413.17    +   OLHC     U   72.02  101.27   87.76   29.25  101.27  13.51
2012-12-31 00:00:00+00:00  1366.42  1448.00  1366.42  1426.19    +   OLHC     U   81.58   81.58   59.77    0.00   81.58  21.81
2013-02-13 00:00:00+00:00  1426.19  1524.69  1426.19  1520.33    +   OLHC     U   98.50   98.50   94.14    0.00   98.50   4.36

然后,我仅按百分比值选择事件:

col_name = 'HC'
group_name = col_name + '_lev'     

level_value = .95
hc_group = df[df[col_name] > df[col_name].quantile(level_value)]
hc_group.loc[:, group_name] = col_name

我得到结果:

Date                        Open     High      Low    Close Sign Struct Trend      OH      HL      LC     OL      LH     HC HC_lev
1998-10-14 00:00:00+00:00   957.28  1066.11   923.32  1005.53    +   OLHC     U  108.83  142.79   82.21  33.96  142.79  60.58     HC
2009-05-21 00:00:00+00:00   815.55   930.17   814.84   888.33    +   OLHC     U  114.62  115.33   73.49   0.71  115.33  41.84     HC
2009-07-04 00:00:00+00:00   888.33   956.23   881.46   896.42    +   OLHC     U   67.90   74.77   14.96   6.87   74.77  59.81     HC
2009-08-17 00:00:00+00:00   896.42  1018.00   869.32   979.73    +   OLHC     U  121.58  148.68  110.41  27.10  148.68  38.27     HC
2011-03-12 00:00:00+00:00  1299.63  1344.07  1275.10  1304.28    +   OLHC     U   44.44   68.97   29.18  24.53   68.97  39.79     HC
2011-12-01 00:00:00+00:00  1223.46  1292.66  1158.66  1244.58    +   OLHC     U   69.20  134.00   85.92  64.80  134.00  48.08     HC
2016-06-29 00:00:00+00:00  2065.04  2120.55  1991.68  2070.77    +   OLHC     U   55.51  128.87   79.09  73.36  128.87  49.78     HC

此代码可以正常工作。

我想对列表['OH','HL','LC','OL','LH','HC']和 返回按这些列分组的groupby对象。

换句话说,我需要一个大对象,包括oh_group,hl_group,...,hc_group

您能告诉我,如何处理吗?

1 个答案:

答案 0 :(得分:0)

最后,我找到了解决方案。可能对其他人有用。

names = ['OH', 'HL', 'LC', 'OL', 'LH', 'HC']
percentiles = [.75, .90, .95, .98]

for col_name in names:
    for perc in percentiles:
        k = df[df[col_name] > df[col_name].quantile(perc)]
        k.loc[:, 'Level'] = str(perc)

        total_df = pd.concat([total_df, k], sort=False)

    print(col_name + ' events:')
    print('----------')
    print_groups(total_df.groupby('Level'))