我正在尝试从数据帧的3列构建一个kd树,如下所示:
+-----------+--------+----------+---------+
| obj_type| Cord1| Cord2| Cord3|
+-----------+--------+----------+---------+
|prox_fmr1t2|559.6759|-4684.2472|4281.8491|
| prox_never|560.0638|-4684.4120|4281.6181|
| prox_never|560.4613|-4684.3282|4281.6578|
+-----------+--------+----------+---------+
使用这些正常工作的命令:
np_obj_filter=np.array(obj_filter.rdd.map(lambda l: (l[0],l[1],l[2])).collect())
tree = spatial.cKDTree(np_obj_filter,leafsize=16)
问题是collect()是一项代价高昂的操作。有没有办法,我可以避免它,加速数百万点的KD树建设?