我有一个在CPU上占用128ms的函数,我试图对其进行并行化,但是它具有相同的效果,根本没有提高速度。我不确定我在做什么错。
这是计算上昂贵的功能:
def compute(id, x,y,frame_id,df_region, region_buffered, df_line):
gb = ('trajectory_id',)
global general_pd
general_pd.loc[len(general_pd)] = [id, x,y, frame_id]
grouped = general_pd.loc[general_pd['trajectory_id'] == id]
rpp = RawParameterProcessor(grouped, df_line, frame_idx, df_region, region_buffered, gb=gb, v_thresh=5 / 3.6)
df_parameter_car = rpp.compute()
return df_parameter_car
并且我在tensorflow对象检测中使用它,对检测到的对象执行计算:
请注意,以下代码是在主循环中执行的,我从cv2视频捕获中获取了数据。
for trk in car_detections:
trk = trk.astype(np.int32)
p = np.array([[((trk[1] + trk[3]) / 2, (trk[0] + trk[2]) / 2)]], dtype=np.float32)
center_pt = cv2.perspectiveTransform(p, H)
ptx = center_pt.T.item(0)
pty = center_pt.T.item(1)
df_cars = pool.apply_async(compute, (trk[4], ptx, pty, frame_idx, df_region, region_buffered, df_line,))
results.append(df_cars)
for result in results:
genera_data_pd_cars = genera_data_pd_cars.append(result.get())