Question

对于从csv文件导入并包含冗余数据（列）的给定数据帧df，我想编写一个允许执行df.columns的递归过滤和后续重命名的函数，根据给出的论据数量。

理想情况下，该功能应如下所示。当输入为(df, 'string1a', 'string1b', 'new_col_name1')时，则：

filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
df_out = df [ filter1]
df_out.columns= ['new_col_name1']
return df_out

然而，当输入是： (df, 'string1a', 'string1b', 'new_col_name1','string2a', 'string2b', 'new_col_name2', 'string3a', 'string3b', 'new_col_name3')函数应返回

filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
filter2 = [col for col in df.columns if 'string2a' in col and 'string2b' in col]
filter3 = [col for col in df.columns if 'string3a' in col and 'string3b' in col]

df_out = df [ filter1 + filter2 + filter3 ]
df_out.columns= ['new_col_name1','new_col_name2','new_col_name3']
return df_out

Answer 1

我认为您可以使用字典来定义值，然后使用np.logical_and.reduce应用函数，因为需要在list中检查多个值：

df = pd.DataFrame({'aavfb':list('abcdef'),
                   'cedf':[4,5,4,5,5,4],
                   'd':[7,8,9,4,2,3],
                   'c':[1,3,5,7,1,0],
                   'abds':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   F aavfb  abds  c  cedf  d
0  a     a     5  1     4  7
1  a     b     3  3     5  8
2  a     c     6  5     4  9
3  b     d     9  7     5  4
4  b     e     2  1     5  2
5  b     f     4  0     4  3

def rename1(df, d):
    #loop in dict
    for k,v in d.items():
        #get mask for columns contains all values in lists
        m = np.logical_and.reduce([df.columns.str.contains(x) for x in v])         
        #set new columns names by mask 
        df.columns = np.where(m, k, df.columns)

    #filter all columns by keys of dict
    return df.loc[:, df.columns.isin(d.keys())]

d = {'new_col_name1':['a', 'b'],
     'new_col_name2':['c', 'd']}   

print (rename1(df, d))

   new_col_name1  new_col_name1  new_col_name2
0              a              5              4
1              b              3              5
2              c              6              4
3              d              9              5
4              e              2              5
5              f              4              4

编写函数以根据变量输入过滤和重命名多个数据帧列

1 个答案: