我试图避免在每次递归调用中将不变的数据帧传递给递归函数。我实际上不确定这是一个问题,尽管我可以看到是这样。
我已经实现了我的完整递归函数(find_all_ns)的虚拟版本,该函数在其中调用了另一个函数(find_ns)。预期的用例中,find_ns通过与熊猫数据框中的所有行进行一些比较来进行操作,因此find_ns需要将数据框作为参数。我想知道是否有一种方法可以避免必须调用我的递归函数find_all_ns并每次将其传递给数据框-我根本没有修改数据框中的信息。我不确定是否通过递归函数将数据框实际上是一个实际问题,但我会这样假设吗? 我将注意到,全局变量df并预先分配find_ns(pt,data = df)在整个情况下都无法正常工作,因为find_all_ns是可调用的,可以在不同的数据帧上操作。
#Dummy version
data = [0, 1, 2, 3, 6] #Real version is a ~large dataframe
def find_all_ns(neighbors, prev_checked):
to_check = neighbors - prev_checked
if len(to_check) == 0:
return neighbors
else:
new_ns = set()
for pt in to_check:
nns = find_ns(pt)
new_ns = new_ns.union(nns)
prev_checked.add(pt)
neighs = neighbors.union(new_ns)
return find_all_ns(neighs, prev_checked)
def find_ns(pt):
ns = set()
for other_pt in data: #real version need to pass in full dataframe
if abs(other_pt - pt) <= 1:
ns.add(other_pt)
return ns
all_ns = find_all_ns({0}, set())
print(all_ns)
'''
find_neighbors is the full version of find_ns, and I will not be able to
change it to avoid requiring a dataframe (df). So it seems that I have to
pass find_all_ns a dataframe argument repeatedly - and this seems like it
could be a problem.
'''
def find_neighbors(dist_metric, epsi, df, pt):
neighborhood = set()
my_pt = df.loc[pt,:] #pt is an index of a row
for index, row in df.iterrows():
dist = dist_metric(my_pt, row)
neighborhood.add(index)
return neighborhood