链接运算符以识别记录最接近数字的值

时间:2016-04-22 17:10:35

标签: python pandas

我有DataFrame df

id  Volume  time_norm   time_norm_ratio speed   BPR_free_speed  free_flow_speed capacity_speed  dev_free_flow
9SOUTHBOUND 1474    85  1.794392523 8.947916667 17.88   16.05607477 8.028037383 0.919879283
9SOUTHBOUND 1375    17  1.158878505 13.85483871 17.88   16.05607477 8.028037383 5.826801327
9SOUTHBOUND 1052    22  1.205607477 13.31782946 17.88   16.05607477 8.028037383 5.289792074
9SOUTHBOUND 986 21  1.196261682 13.421875   17.88   16.05607477 8.028037383 5.393837617
9SOUTHBOUND 1071    15  1.140186916 14.08196721 17.88   16.05607477 8.028037383 6.05392983
9SOUTHBOUND 1206    34  1.317757009 12.18439716 17.88   16.05607477 8.028037383 4.15635978
9SOUTHBOUND 1222    34  1.317757009 12.18439716 17.88   16.05607477 8.028037383 4.15635978
9SOUTHBOUND 1408    33  1.308411215 12.27142857 17.88   16.05607477 8.028037383 4.243391188
9SOUTHBOUND 1604    69  1.644859813 9.761363636 17.88   16.05607477 8.028037383 1.733326253
9SOUTHBOUND 1731    124 2.158878505 7.437229437 17.88   16.05607477 8.028037383 -0.590807946
9SOUTHBOUND 1596    640 6.981308411 2.299866131 17.88   16.05607477 8.028037383 -5.728171252
9NORTHBOUND 449 17  1.17    14.66666667 17.88   17.16   8.58    6.086666667
9NORTHBOUND 299 17  1.17    14.66666667 17.88   17.16   8.58    6.086666667
9NORTHBOUND 241 18  1.18    14.54237288 17.88   17.16   8.58    5.962372881
9NORTHBOUND 164 13  1.13    15.18584071 17.88   17.16   8.58    6.605840708
9NORTHBOUND 142 16  1.16    14.79310345 17.88   17.16   8.58    6.213103448
9NORTHBOUND 137 15  1.15    14.92173913 17.88   17.16   8.58    6.34173913
9NORTHBOUND 196 13  1.13    15.18584071 17.88   17.16   8.58    6.605840708

当速度为每个volume的最大速度的50%时,我希望找到id。为了做到这一点,我找到了每个id的最大速度(free_flow_speed),计算了50%,并将其设置为free_flow_speed。为了确定哪个记录最接近50%free_flow_speed,我创建了dev_free_flow列,这是给定speedfree_flow_speed之间的差异。找到最接近零的记录,对于每个id,应标识要为cap_design值归属的记录。

因此,对于每个cap_design,我想创建一个新列volume diff iddf['cap_design'] = df['Volume'].where(df.groupby('id')['diff'].transform('min')) 最接近于零。

从我最后的问题(我在这里度过了愉快的一天),我创造了:

cap_design

但是,这会返回每行Volume的{​​{1}}值,而不是dev_free_flowid的最接近零值的值。我如何实现这一目标?

1 个答案:

答案 0 :(得分:2)

使用pd.Series.searchsorted(),您可以获取应在排序Series中插入给定值以维持顺序的索引(在您的情况下为Series.max()的50%),您可以然后可以用来选择另一个系列中的匹配值(Volume)。因此,使用似乎是您数据的相关子集:

             id  Volume      speed
13  9NORTHBOUND     241  14.542373
11  9NORTHBOUND     449  14.666667
12  9NORTHBOUND     299  14.666667
15  9NORTHBOUND     142  14.793103
16  9NORTHBOUND     137  14.921739
14  9NORTHBOUND     164  15.185841
17  9NORTHBOUND     196  15.185841
10  9SOUTHBOUND    1596   2.299866
9   9SOUTHBOUND    1731   7.437229
0   9SOUTHBOUND    1474   8.947917
8   9SOUTHBOUND    1604   9.761364
5   9SOUTHBOUND    1206  12.184397
6   9SOUTHBOUND    1222  12.184397
7   9SOUTHBOUND    1408  12.271429
2   9SOUTHBOUND    1052  13.317829
3   9SOUTHBOUND     986  13.421875
1   9SOUTHBOUND    1375  13.854839
4   9SOUTHBOUND    1071  14.081967

使用:

df = df.sort_values(['id', 'speed'])
df.groupby('id').apply(lambda x: x.Volume.iloc[x.speed.searchsorted(x.speed.max()*.5)])

得到:

9NORTHBOUND  13     241
9SOUTHBOUND  9     1731
Name: Volume, dtype: int64

如果您希望将结果作为新列,则可以执行以下操作:

df['result'] = df.groupby('id', as_index=False).apply(lambda x: pd.Series(x.Volume.iloc[x.speed.searchsorted(x.speed.max()/2)].tolist() * len(x),index=x.index)).reset_index(level=0, drop=True)

df.loc[:, ['id', 'Volume', 'speed', 'result']]

             id  Volume      speed  result
0   9NORTHBOUND     241  14.542373     241
1   9NORTHBOUND     449  14.666667     241
2   9NORTHBOUND     299  14.666667     241
3   9NORTHBOUND     142  14.793103     241
4   9NORTHBOUND     137  14.921739     241
5   9NORTHBOUND     164  15.185841     241
6   9NORTHBOUND     196  15.185841     241
7   9SOUTHBOUND    1596   2.299866    1731
8   9SOUTHBOUND    1731   7.437229    1731
9   9SOUTHBOUND    1474   8.947917    1731
10  9SOUTHBOUND    1604   9.761364    1731
11  9SOUTHBOUND    1206  12.184397    1731
12  9SOUTHBOUND    1222  12.184397    1731
13  9SOUTHBOUND    1408  12.271429    1731
14  9SOUTHBOUND    1052  13.317829    1731
15  9SOUTHBOUND     986  13.421875    1731
16  9SOUTHBOUND    1375  13.854839    1731
17  9SOUTHBOUND    1071  14.081967    1731