对于从csv文件导入并包含冗余数据(列)的给定数据帧df
,我想编写一个允许执行df.columns
的递归过滤和后续重命名的函数,根据给出的论据数量。
理想情况下,该功能应如下所示。
当输入为(df, 'string1a', 'string1b', 'new_col_name1')
时,则:
filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
df_out = df [ filter1]
df_out.columns= ['new_col_name1']
return df_out
然而,当输入是:
(df, 'string1a', 'string1b', 'new_col_name1','string2a', 'string2b', 'new_col_name2', 'string3a', 'string3b', 'new_col_name3')
函数应返回
filter1 = [col for col in df.columns if 'string1a' in col and 'string1b' in col]
filter2 = [col for col in df.columns if 'string2a' in col and 'string2b' in col]
filter3 = [col for col in df.columns if 'string3a' in col and 'string3b' in col]
df_out = df [ filter1 + filter2 + filter3 ]
df_out.columns= ['new_col_name1','new_col_name2','new_col_name3']
return df_out
答案 0 :(得分:1)
我认为您可以使用字典来定义值,然后使用np.logical_and.reduce应用函数,因为需要在list
中检查多个值:
df = pd.DataFrame({'aavfb':list('abcdef'),
'cedf':[4,5,4,5,5,4],
'd':[7,8,9,4,2,3],
'c':[1,3,5,7,1,0],
'abds':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
F aavfb abds c cedf d
0 a a 5 1 4 7
1 a b 3 3 5 8
2 a c 6 5 4 9
3 b d 9 7 5 4
4 b e 2 1 5 2
5 b f 4 0 4 3
def rename1(df, d):
#loop in dict
for k,v in d.items():
#get mask for columns contains all values in lists
m = np.logical_and.reduce([df.columns.str.contains(x) for x in v])
#set new columns names by mask
df.columns = np.where(m, k, df.columns)
#filter all columns by keys of dict
return df.loc[:, df.columns.isin(d.keys())]
d = {'new_col_name1':['a', 'b'],
'new_col_name2':['c', 'd']}
print (rename1(df, d))
new_col_name1 new_col_name1 new_col_name2
0 a 5 4
1 b 3 5
2 c 6 4
3 d 9 5
4 e 2 5
5 f 4 4