避免重复将数据帧传递给递归函数?

时间:2018-12-31 18:29:09

标签: python pandas recursion

我试图避免在每次递归调用中将不变的数据帧传递给递归函数。我实际上不确定这是一个问题,尽管我可以看到是这样。

我已经实现了我的完整递归函数(find_all_ns)的虚拟版本,该函数在其中调用了另一个函数(find_ns)。预期的用例中,find_ns通过与熊猫数据框中的所有行进行一些比较来进行操作,因此find_ns需要将数据框作为参数。我想知道是否有一种方法可以避免必须调用我的递归函数find_all_ns并每次将其传递给数据框-我根本没有修改数据框中的信息。我不确定是否通过递归函数将数据框实际上是一个实际问题,但我会这样假设吗? 我将注意到,全局变量df并预先分配find_ns(pt,data = df)在整个情况下都无法正常工作,因为find_all_ns是可调用的,可以在不同的数据帧上操作。

#Dummy version
data = [0, 1, 2, 3, 6] #Real version is a ~large dataframe

def find_all_ns(neighbors, prev_checked):
    to_check = neighbors - prev_checked
    if len(to_check) == 0:
        return neighbors
    else:
        new_ns = set()
        for pt in to_check:
            nns = find_ns(pt)
            new_ns = new_ns.union(nns)
            prev_checked.add(pt)
        neighs = neighbors.union(new_ns)
        return find_all_ns(neighs, prev_checked)

def find_ns(pt):
    ns = set()
    for other_pt in data: #real version need to pass in full dataframe
        if abs(other_pt - pt) <= 1:
            ns.add(other_pt)
    return ns

all_ns = find_all_ns({0}, set())
print(all_ns)



'''
find_neighbors is the full version of find_ns, and I will not be able to
change it to avoid requiring a dataframe (df). So it seems that I have to 
pass find_all_ns a dataframe argument repeatedly - and this seems like it 
could be a problem. 
'''

def find_neighbors(dist_metric, epsi, df, pt):
    neighborhood = set()
    my_pt = df.loc[pt,:] #pt is an index of a row
    for index, row in df.iterrows():
        dist = dist_metric(my_pt, row)
        neighborhood.add(index)
    return neighborhood

0 个答案:

没有答案