Question

我编写了一个简单的递归函数来删除具有最大总和的列，直到数据帧减少到我想要的大小。这是代码：

s = pd.DataFrame({'a': [1,1,1,1,1,1], 
                  'b': [2,2,2,2,2,2], 
                  'c': [3,3,3,3,3,3], 
                  'd': [4,4,4,4,4,4], 
                  'e': [5,5,5,5,5,5]}) 

def recSelect(inputdf):
    if inputdf.shape[1]<=2:
        return inputdf
    else:
        total = inputdf.sum()
        idx = total.idxmax()
        inputdf.drop(idx, axis=1, inplace=True)
        return recSelect(inputdf)

recSelect(s)

在上面的代码中，列＆＃39; e＆＃39;首先删除，然后删除列＆＃39; d＆＃39;然后＆＃39; c＆＃39;。我的问题是：我如何正确地返回“idx”＆＃39;并获得一个列表[＆＃39; e＆＃39;＆＃39; d＆＃39;，＆＃39; c＆＃39;] ??

这是我尝试过但不起作用的事情：

idxs = [] # create an empty list
def recSelect(inputdf):
    if inputdf.shape[1]<=2:
        return inputdf
    else:
        total = inputdf.sum()
        idx = total.idxmax()
        idxs.append(idx) # append each idx
        inputdf.drop(idx, axis=1, inplace=True)
        return recSelect(inputdf), idxs

Answer 1

尽量避免使用全局变量 - 在递归中使用它！在函数中添加一个额外的参数。这将需要是一个列表来存储已删除的列名，但我们将默认值设置为None，因此列表不会在函数调用之间共享。在第一次调用时初始化空列表，并在每次删除列时更新它。

import pandas as pd

s = pd.DataFrame({'a': [1,1,1,1,1,1], 
                  'b': [2,2,2,2,2,2], 
                  'c': [3,3,3,3,3,3], 
                  'd': [4,4,4,4,4,4], 
                  'e': [5,5,5,5,5,5]}) 

def recSelect(inputdf, removed=None):
    if not removed:
        removed=[]
    if inputdf.shape[1]<=2:
        return inputdf, removed
    else:
        total = inputdf.sum()
        idx = total.idxmax()
        inputdf.drop(idx, axis=1, inplace=True)
        removed.append(idx)
        return recSelect(inputdf, removed)

vals, removed = recSelect(s)

print(removed)

打印

['e', 'd', 'c']

Answer 2

您可以尝试以下代码：

your_list = list() # the traking list

s = pd.DataFrame({'a': [1,1,1,1,1,1], 
                  'b': [2,2,2,2,2,2], 
                  'c': [3,3,3,3,3,3], 
                  'd': [4,4,4,4,4,4], 
                  'e': [5,5,5,5,5,5]}) 

def recSelect(inputdf):
    if inputdf.shape[1]<=2:
        return inputdf
    else:
        total = inputdf.sum()
        idx = total.idxmax()
        your_list.append(idx) # append the dropped idx
        inputdf.drop(idx, axis=1, inplace=True)
        return recSelect(inputdf)

recSelect(s)
print your_list  # there you go!

输出['e', 'd', 'c']

Answer 3

如果您对我的方法有所了解，为什么在您可以按sum对值进行排序并可以访问其索引时创建递归函数。即

s = pd.DataFrame({'a': [1,1,1,1,1,1], 
              'b': [2,2,2,2,2,2], 
              'c': [5,5,5,5,5,5], 
              'd': [4,4,4,4,4,4], 
              'e': [1,5,5,5,5,5]}) 

sum_order = s.sum().sort_values().index
ndf = s[sum_order[:2]]
li = sum_order[2:][::-1].tolist()

输出：

ndf
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
5  1  2
li
['c', 'e', 'd']

当你在熊猫时尽量避免循环。

如何将递归函数的中间结果放入列表？

3 个答案: