我有DataFrame
df
:
id Volume time_norm time_norm_ratio speed BPR_free_speed free_flow_speed capacity_speed dev_free_flow
9SOUTHBOUND 1474 85 1.794392523 8.947916667 17.88 16.05607477 8.028037383 0.919879283
9SOUTHBOUND 1375 17 1.158878505 13.85483871 17.88 16.05607477 8.028037383 5.826801327
9SOUTHBOUND 1052 22 1.205607477 13.31782946 17.88 16.05607477 8.028037383 5.289792074
9SOUTHBOUND 986 21 1.196261682 13.421875 17.88 16.05607477 8.028037383 5.393837617
9SOUTHBOUND 1071 15 1.140186916 14.08196721 17.88 16.05607477 8.028037383 6.05392983
9SOUTHBOUND 1206 34 1.317757009 12.18439716 17.88 16.05607477 8.028037383 4.15635978
9SOUTHBOUND 1222 34 1.317757009 12.18439716 17.88 16.05607477 8.028037383 4.15635978
9SOUTHBOUND 1408 33 1.308411215 12.27142857 17.88 16.05607477 8.028037383 4.243391188
9SOUTHBOUND 1604 69 1.644859813 9.761363636 17.88 16.05607477 8.028037383 1.733326253
9SOUTHBOUND 1731 124 2.158878505 7.437229437 17.88 16.05607477 8.028037383 -0.590807946
9SOUTHBOUND 1596 640 6.981308411 2.299866131 17.88 16.05607477 8.028037383 -5.728171252
9NORTHBOUND 449 17 1.17 14.66666667 17.88 17.16 8.58 6.086666667
9NORTHBOUND 299 17 1.17 14.66666667 17.88 17.16 8.58 6.086666667
9NORTHBOUND 241 18 1.18 14.54237288 17.88 17.16 8.58 5.962372881
9NORTHBOUND 164 13 1.13 15.18584071 17.88 17.16 8.58 6.605840708
9NORTHBOUND 142 16 1.16 14.79310345 17.88 17.16 8.58 6.213103448
9NORTHBOUND 137 15 1.15 14.92173913 17.88 17.16 8.58 6.34173913
9NORTHBOUND 196 13 1.13 15.18584071 17.88 17.16 8.58 6.605840708
当速度为每个volume
的最大速度的50%时,我希望找到id
。为了做到这一点,我找到了每个id的最大速度(free_flow_speed
),计算了50%,并将其设置为free_flow_speed
。为了确定哪个记录最接近50%free_flow_speed
,我创建了dev_free_flow
列,这是给定speed
和free_flow_speed
之间的差异。找到最接近零的记录,对于每个id
,应标识要为cap_design
值归属的记录。
因此,对于每个cap_design
,我想创建一个新列volume
diff
id
,df['cap_design'] = df['Volume'].where(df.groupby('id')['diff'].transform('min'))
最接近于零。
从我最后的问题(我在这里度过了愉快的一天),我创造了:
cap_design
但是,这会返回每行Volume
的{{1}}值,而不是dev_free_flow
每id
的最接近零值的值。我如何实现这一目标?
答案 0 :(得分:2)
使用pd.Series.searchsorted()
,您可以获取应在排序Series
中插入给定值以维持顺序的索引(在您的情况下为Series.max()
的50%),您可以然后可以用来选择另一个系列中的匹配值(Volume
)。因此,使用似乎是您数据的相关子集:
id Volume speed
13 9NORTHBOUND 241 14.542373
11 9NORTHBOUND 449 14.666667
12 9NORTHBOUND 299 14.666667
15 9NORTHBOUND 142 14.793103
16 9NORTHBOUND 137 14.921739
14 9NORTHBOUND 164 15.185841
17 9NORTHBOUND 196 15.185841
10 9SOUTHBOUND 1596 2.299866
9 9SOUTHBOUND 1731 7.437229
0 9SOUTHBOUND 1474 8.947917
8 9SOUTHBOUND 1604 9.761364
5 9SOUTHBOUND 1206 12.184397
6 9SOUTHBOUND 1222 12.184397
7 9SOUTHBOUND 1408 12.271429
2 9SOUTHBOUND 1052 13.317829
3 9SOUTHBOUND 986 13.421875
1 9SOUTHBOUND 1375 13.854839
4 9SOUTHBOUND 1071 14.081967
使用:
df = df.sort_values(['id', 'speed'])
df.groupby('id').apply(lambda x: x.Volume.iloc[x.speed.searchsorted(x.speed.max()*.5)])
得到:
9NORTHBOUND 13 241
9SOUTHBOUND 9 1731
Name: Volume, dtype: int64
如果您希望将结果作为新列,则可以执行以下操作:
df['result'] = df.groupby('id', as_index=False).apply(lambda x: pd.Series(x.Volume.iloc[x.speed.searchsorted(x.speed.max()/2)].tolist() * len(x),index=x.index)).reset_index(level=0, drop=True)
df.loc[:, ['id', 'Volume', 'speed', 'result']]
id Volume speed result
0 9NORTHBOUND 241 14.542373 241
1 9NORTHBOUND 449 14.666667 241
2 9NORTHBOUND 299 14.666667 241
3 9NORTHBOUND 142 14.793103 241
4 9NORTHBOUND 137 14.921739 241
5 9NORTHBOUND 164 15.185841 241
6 9NORTHBOUND 196 15.185841 241
7 9SOUTHBOUND 1596 2.299866 1731
8 9SOUTHBOUND 1731 7.437229 1731
9 9SOUTHBOUND 1474 8.947917 1731
10 9SOUTHBOUND 1604 9.761364 1731
11 9SOUTHBOUND 1206 12.184397 1731
12 9SOUTHBOUND 1222 12.184397 1731
13 9SOUTHBOUND 1408 12.271429 1731
14 9SOUTHBOUND 1052 13.317829 1731
15 9SOUTHBOUND 986 13.421875 1731
16 9SOUTHBOUND 1375 13.854839 1731
17 9SOUTHBOUND 1071 14.081967 1731