在groupby数据帧上使用Scipy Percentileofscore

时间:2017-04-17 21:06:38

标签: python pandas scipy

我通过读取下面格式的csv数据创建了一个数据框

Date,Open,High,Low,Close,Volume,Adj Close,Ticker,Indicator1,Indicator2
42255,91.760002,92.790001,90.400002,92.720001,3085500,86.16844,LB,302.911961,45.621095920339
42251,88.550003,90.860001,88,90.379997,3230200,83.993779,LB,211.511385,45.7675721184876
42250,87.110001,90.769997,87.110001,89.279999,3989900,82.971506,LB,177.1386378,46.0213252964444
42255,65.82,66.790001,65.739998,66.769997,6397600,64.544698,DD,140.6188408,46.1284286660104
42251,30.559999,31.41,30.559999,31.4,13911700,31.4,EBAY,128.3615396,46.6328167692573
42250,64.279999,66.199997,64.279999,66.110001,6612700,63.906699,DD,111.3219234,47.1501954595785
42255,173.699997,177.410004,173.699997,177.279999,7107100,177.279999,BRK-B,103.1589082,48.0697637559109
42251,30.309999,30.860001,30.27,30.68,17892900,30.68,EBAY,100.6122268,48.3165158150696
42250,29.809999,30.559999,29.75,30.49,20272000,30.49,EBAY,94.75403852,49.066388420196
42255,84.68,86.010002,83.32,85.730003,3411000,79.672352,LB,88.39444803,50.0061610393543
42251,68.629997,70.099998,68.470001,69.910004,4018100,69.910004,AKAM,84.82357186,50.7093832981117
42250,28.870001,30.309999,28.790001,29.93,44959100,29.93,EBAY,80.94104725,51.6730513843059
42255,49.02,49.240002,47,47.650002,14153200,47.461114,DAL,78.71521075,51.6915087811999
42251,70.360001,74.75,70.360001,71.75,3296300,71.75,EVHC,78.54129955,51.9876960547054

我想在dataframe中添加另一列,该列计算给定日期指标1的百分位数,即特定日期的不同股票代码的所有值。

有人可以帮我解决python中需要的代码吗?我是python的新手。

1 个答案:

答案 0 :(得分:1)

IIUC: 使用rank方法。

print(df)
    Date        Open        High         Low       Close    Volume  Adj Close Ticker  Indicator1  Indicator2
0   42255   91.760002   92.790001   90.400002   92.720001   3085500  86.168440     LB  302.911961   45.621096
1   42251   88.550003   90.860001   88.000000   90.379997   3230200  83.993779     LB  211.511385   45.767572
2   42250   87.110001   90.769997   87.110001   89.279999   3989900  82.971506     LB  177.138638   46.021325
3   42255   65.820000   66.790001   65.739998   66.769997   6397600  64.544698     DD  140.618841   46.128429
4   42251   30.559999   31.410000   30.559999   31.400000  13911700  31.400000   EBAY  128.361540   46.632817
5   42250   64.279999   66.199997   64.279999   66.110001   6612700  63.906699     DD  111.321923   47.150195
6   42255  173.699997  177.410004  173.699997  177.279999   7107100 177.279999  BRK-B  103.158908   48.069764
7   42251   30.309999   30.860001   30.270000   30.680000  17892900 30.680000   EBAY  100.612227   48.316516
8   42250   29.809999   30.559999   29.750000   30.490000  20272000  30.490000   EBAY   94.754039   49.066388
9   42255   84.680000   86.010002   83.320000   85.730003   3411000 79.672352     LB   88.394448   50.006161
10  42251   68.629997   70.099998   68.470001   69.910004   4018100  69.910004   AKAM   84.823572   50.709383
11  42250   28.870001   30.309999   28.790001   29.930000  44959100 29.930000   EBAY   80.941047   51.673051
12  42255   49.020000   49.240002   47.000000   47.650002  14153200 47.461114    DAL   78.715211   51.691509
13  42251   70.360001   74.750000   70.360001   71.750000   3296300 71.750000   EVHC   78.541300   51.987696


df['Indicator1_percentile'] = df.Indicator1.rank(pct=True)

print(df['Indicator1_percentile']
0     1.000000
1     0.928571
2     0.857143
3     0.785714
4     0.714286
5     0.642857
6     0.571429
7     0.500000
8     0.428571
9     0.357143
10    0.285714
11    0.214286
12    0.142857
13    0.071429
Name: Indicator1, dtype: float64